Extractors

Configure what data to extract from your documents

What is an Extractor?

An extractor is a reusable configuration that defines:

  1. Schema - What fields to extract and their types
  2. Model Tier - Which AI model to use (Mini, Pro, or Max)
  3. Validation Rules - Optional constraints on extracted values

Think of an extractor as a template that tells Automat exactly what information to pull from your documents.

Creating an Extractor

Via Dashboard

  1. Navigate to your project in the Automat Dashboard
  2. Click Create Extractor
  3. Name your extractor (e.g., “Invoice Extractor”)
  4. Define your schema using the visual editor or JSON
  5. Test with sample documents
  6. Publish when ready

Schema Definition

Schemas define the structure of extracted data:

1{
2 "invoice_number": {
3 "type": "string",
4 "description": "The unique invoice identifier"
5 },
6 "date": {
7 "type": "string",
8 "description": "Invoice date in YYYY-MM-DD format"
9 },
10 "total_amount": {
11 "type": "number",
12 "description": "Total amount due"
13 },
14 "line_items": {
15 "type": "array",
16 "items": {
17 "type": "object",
18 "properties": {
19 "description": { "type": "string" },
20 "quantity": { "type": "number" },
21 "unit_price": { "type": "number" }
22 }
23 }
24 }
25}

Supported Field Types

TypeDescriptionExample
stringText values"INV-001"
numberNumeric values (integer or decimal)1250.00
booleanTrue/false valuestrue
arrayList of values["item1", "item2"]
objectNested object{ "name": "...", "address": "..." }
enumPredefined set of values"approved" | "pending" | "rejected"

Model Tiers

Choose the right model tier for your use case:

Mini

Fastest & Most Economical

Best for simple documents with clear layouts. Ideal for high-volume processing.

Powered by Gemini 3 Flash

Pro

Balanced Performance

Excellent accuracy for most document types. Recommended for production use.

Powered by Gemini 2.5 Pro

Max

Maximum Accuracy

Best for complex documents, poor quality scans, or when accuracy is critical.

Powered by Claude 4.5 Opus

Using Extractors

Get Extractor ID

Every extractor has a unique ID (e.g., ext_abc123). Find it in:

  • The extractor detail page in your dashboard
  • The URL when viewing an extractor

Make Extraction Requests

1const result = await client.extract({
2 extractorId: 'ext_abc123', // Your extractor ID
3 file: document,
4});

Versioning

Extractors support versioning for production stability:

  • Draft - Work in progress, can be modified freely
  • Published - Locked version for production use
  • Multiple published versions can exist simultaneously

When you publish an extractor, the model configuration is frozen. This ensures consistent extraction results even as underlying models improve.

Best Practices

Clear field names help the AI understand what to extract:

invoice_date, total_amount_due, vendor_address

field1, date, val

Include descriptions to guide extraction, especially for ambiguous fields:

1{
2 "effective_date": {
3 "type": "string",
4 "description": "The date the contract becomes effective, not the signing date"
5 }
6}

Before publishing, test your extractor with documents that have:

  • Different layouts
  • Various quality levels
  • Edge cases (missing fields, unusual formats)

Begin with core fields, validate accuracy, then add more complex extractions.

Next Steps