For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DashboardSign Up
GuidesAPI Reference
  • Getting Started
    • Introduction
    • Authentication
    • Quickstart
  • Concepts
    • Extractors
    • Schemas
LogoLogo
DashboardSign Up
On this page
  • What is an Extractor?
  • Creating an Extractor
  • Via Dashboard
  • Schema Definition
  • Supported Field Types
  • Model Tiers
  • Using Extractors
  • Get Extractor ID
  • Make Extraction Requests
  • Versioning
  • Best Practices
  • Next Steps
Concepts

Extractors

Configure what data to extract from your documents
Was this page helpful?
Previous

Schemas

Define the structure of your extracted data
Next
Built with

What is an Extractor?

An extractor is a reusable configuration that defines:

  1. Schema - What fields to extract and their types
  2. Model Tier - Which AI model to use (Mini, Pro, or Max)
  3. Validation Rules - Optional constraints on extracted values

Think of an extractor as a template that tells Automat exactly what information to pull from your documents.

Creating an Extractor

Via Dashboard

  1. Navigate to your project in the Automat Dashboard
  2. Click Create Extractor
  3. Name your extractor (e.g., “Invoice Extractor”)
  4. Define your schema using the visual editor or JSON
  5. Test with sample documents
  6. Publish when ready

Schema Definition

Schemas define the structure of extracted data:

1{
2 "invoice_number": {
3 "type": "string",
4 "description": "The unique invoice identifier"
5 },
6 "date": {
7 "type": "string",
8 "description": "Invoice date in YYYY-MM-DD format"
9 },
10 "total_amount": {
11 "type": "number",
12 "description": "Total amount due"
13 },
14 "line_items": {
15 "type": "array",
16 "items": {
17 "type": "object",
18 "properties": {
19 "description": { "type": "string" },
20 "quantity": { "type": "number" },
21 "unit_price": { "type": "number" }
22 }
23 }
24 }
25}

Supported Field Types

TypeDescriptionExample
stringText values"INV-001"
numberNumeric values (integer or decimal)1250.00
booleanTrue/false valuestrue
arrayList of values["item1", "item2"]
objectNested object{ "name": "...", "address": "..." }
enumPredefined set of values"approved" | "pending" | "rejected"

Model Tiers

Choose the right model tier for your use case:

Mini

Speed-first Optimized for simple extraction tasks and high volume. Default backend: OpenAI (gpt-5.4-mini)

Pro

Balanced Reliable accuracy with moderate latency—good default for most production documents. Default backend: OpenAI (gpt-5.4)

Max

Highest accuracy Reasoning-oriented model for difficult layouts, poor scans, or critical fields. Default backend: Google Gemini 3.1 Pro (gemini-3.1-pro-preview)

Using Extractors

Get Extractor ID

Every extractor has a unique ID (e.g., ext_abc123). Find it in:

  • The extractor detail page in your dashboard
  • The URL when viewing an extractor

Make Extraction Requests

$curl -X POST https://studio.runautomat.com/api/extract \
> -H "Authorization: Bearer YOUR_API_KEY" \
> -F "extractorId=ext_abc123" \
> -F "file=@document.pdf"

Versioning

Extractors support versioning for production stability:

  • Draft - Work in progress, can be modified freely
  • Published - Locked version for production use
  • Multiple published versions can exist simultaneously

When you publish an extractor, the model configuration is frozen. This ensures consistent extraction results even as underlying models improve.

Best Practices

Use Descriptive Field Names

Clear field names help the AI understand what to extract:

✅ invoice_date, total_amount_due, vendor_address

❌ field1, date, val

Add Field Descriptions

Include descriptions to guide extraction, especially for ambiguous fields:

1{
2 "effective_date": {
3 "type": "string",
4 "description": "The date the contract becomes effective, not the signing date"
5 }
6}
Test with Varied Documents

Before publishing, test your extractor with documents that have:

  • Different layouts
  • Various quality levels
  • Edge cases (missing fields, unusual formats)
Start Simple, Then Expand

Begin with core fields, validate accuracy, then add more complex extractions.

Next Steps

Schema Reference

Advanced schema configuration

API Reference

Extract endpoint documentation