An extractor is a reusable configuration that defines:
Think of an extractor as a template that tells Automat exactly what information to pull from your documents.
Schemas define the structure of extracted data:
Choose the right model tier for your use case:
Speed-first Optimized for simple extraction tasks and high volume. Default backend: OpenAI (gpt-5.4-mini)
Balanced Reliable accuracy with moderate latency—good default for most production documents. Default backend:
OpenAI (gpt-5.4)
Highest accuracy Reasoning-oriented model for difficult layouts, poor scans, or critical fields. Default
backend: Google Gemini 3.1 Pro (gemini-3.1-pro-preview)
Every extractor has a unique ID (e.g., ext_abc123). Find it in:
Extractors support versioning for production stability:
When you publish an extractor, the model configuration is frozen. This ensures consistent extraction results even as underlying models improve.
Clear field names help the AI understand what to extract:
✅ invoice_date, total_amount_due, vendor_address
❌ field1, date, val
Include descriptions to guide extraction, especially for ambiguous fields:
Before publishing, test your extractor with documents that have:
Begin with core fields, validate accuracy, then add more complex extractions.