Extractors
What is an Extractor?
An extractor is a reusable configuration that defines:
- Schema - What fields to extract and their types
- Model Tier - Which AI model to use (Mini, Pro, or Max)
- Validation Rules - Optional constraints on extracted values
Think of an extractor as a template that tells Automat exactly what information to pull from your documents.
Creating an Extractor
Via Dashboard
- Navigate to your project in the Automat Dashboard
- Click Create Extractor
- Name your extractor (e.g., “Invoice Extractor”)
- Define your schema using the visual editor or JSON
- Test with sample documents
- Publish when ready
Schema Definition
Schemas define the structure of extracted data:
Supported Field Types
Model Tiers
Choose the right model tier for your use case:
Fastest & Most Economical
Best for simple documents with clear layouts. Ideal for high-volume processing.
Powered by Gemini 3 Flash
Balanced Performance
Excellent accuracy for most document types. Recommended for production use.
Powered by Gemini 2.5 Pro
Maximum Accuracy
Best for complex documents, poor quality scans, or when accuracy is critical.
Powered by Claude 4.5 Opus
Using Extractors
Get Extractor ID
Every extractor has a unique ID (e.g., ext_abc123). Find it in:
- The extractor detail page in your dashboard
- The URL when viewing an extractor
Make Extraction Requests
Versioning
Extractors support versioning for production stability:
- Draft - Work in progress, can be modified freely
- Published - Locked version for production use
- Multiple published versions can exist simultaneously
When you publish an extractor, the model configuration is frozen. This ensures consistent extraction results even as underlying models improve.
Best Practices
Use Descriptive Field Names
Clear field names help the AI understand what to extract:
✅ invoice_date, total_amount_due, vendor_address
❌ field1, date, val
Add Field Descriptions
Include descriptions to guide extraction, especially for ambiguous fields:
Test with Varied Documents
Before publishing, test your extractor with documents that have:
- Different layouts
- Various quality levels
- Edge cases (missing fields, unusual formats)
Start Simple, Then Expand
Begin with core fields, validate accuracy, then add more complex extractions.