Extractors | Automat | Documentation

What is an Extractor?

An extractor is a reusable configuration that defines:

Schema - What fields to extract and their types
Model Tier - Which AI model to use (Mini, Pro, or Max)
Validation Rules - Optional constraints on extracted values

Think of an extractor as a template that tells Automat exactly what information to pull from your documents.

Creating an Extractor

Via Dashboard

Navigate to your project in the Automat Dashboard
Click Create Extractor
Name your extractor (e.g., “Invoice Extractor”)
Define your schema using the visual editor or JSON
Test with sample documents
Publish when ready

Schema Definition

Schemas define the structure of extracted data:

1 {
2   "invoice_number": {
3     "type": "string",
4     "description": "The unique invoice identifier"
5   },
6   "date": {
7     "type": "string",
8     "description": "Invoice date in YYYY-MM-DD format"
9   },
10   "total_amount": {
11     "type": "number",
12     "description": "Total amount due"
13   },
14   "line_items": {
15     "type": "array",
16     "items": {
17       "type": "object",
18       "properties": {
19         "description": { "type": "string" },
20         "quantity": { "type": "number" },
21         "unit_price": { "type": "number" }
22       }
23     }
24   }
25 }

Supported Field Types

Type	Description	Example
`string`	Text values	`"INV-001"`
`number`	Numeric values (integer or decimal)	`1250.00`
`boolean`	True/false values	`true`
`array`	List of values	`["item1", "item2"]`
`object`	Nested object	`{ "name": "...", "address": "..." }`
`enum`	Predefined set of values	`"approved"` \| `"pending"` \| `"rejected"`

Model Tiers

Choose the right model tier for your use case:

Mini

Fastest & Most Economical

Best for simple documents with clear layouts. Ideal for high-volume processing.

Powered by Gemini 3 Flash

Pro

Balanced Performance

Excellent accuracy for most document types. Recommended for production use.

Powered by Gemini 2.5 Pro

Max

Maximum Accuracy

Best for complex documents, poor quality scans, or when accuracy is critical.

Powered by Claude 4.5 Opus

Using Extractors

Get Extractor ID

Every extractor has a unique ID (e.g., ext_abc123). Find it in:

The extractor detail page in your dashboard
The URL when viewing an extractor

Make Extraction Requests

1 const result = await client.extract({
2   extractorId: 'ext_abc123', // Your extractor ID
3   file: document,
4 });

Versioning

Extractors support versioning for production stability:

Draft - Work in progress, can be modified freely
Published - Locked version for production use
Multiple published versions can exist simultaneously

When you publish an extractor, the model configuration is frozen. This ensures consistent extraction results even as underlying models improve.

Best Practices

Use Descriptive Field Names

Clear field names help the AI understand what to extract:

✅ invoice_date, total_amount_due, vendor_address

❌ field1, date, val

Add Field Descriptions

Include descriptions to guide extraction, especially for ambiguous fields:

1 {
2   "effective_date": {
3     "type": "string",
4     "description": "The date the contract becomes effective, not the signing date"
5   }
6 }

Test with Varied Documents

Before publishing, test your extractor with documents that have:

Different layouts
Various quality levels
Edge cases (missing fields, unusual formats)

Start Simple, Then Expand

Begin with core fields, validate accuracy, then add more complex extractions.

Next Steps

Schema Reference

Advanced schema configuration

API Reference

Extract endpoint documentation