Schemas | Automat | Documentation

Overview

Schemas define exactly what data to extract from your documents and in what format. A well-defined schema is the key to accurate, consistent extractions.

Automat schemas follow JSON Schema conventions with some AI-specific extensions for better extraction guidance.

Basic Structure

1 {
2   "field_name": {
3     "type": "string",
4     "description": "What this field represents"
5   }
6 }

Every field should include:

type - The data type (string, number, boolean, array, object)
description - A clear explanation that guides the AI

Field Types

String

For text values:

1 {
2   "customer_name": {
3     "type": "string",
4     "description": "Full name of the customer"
5   }
6 }

Number

For numeric values (integers and decimals):

1 {
2   "total_amount": {
3     "type": "number",
4     "description": "Total invoice amount in dollars"
5   }
6 }

Boolean

For true/false values:

1 {
2   "is_paid": {
3     "type": "boolean",
4     "description": "Whether the invoice has been paid"
5   }
6 }

Enum

For predefined set of values:

1 {
2   "status": {
3     "type": "string",
4     "enum": ["pending", "approved", "rejected"],
5     "description": "Current approval status"
6   }
7 }

Array

For lists of values:

1 {
2   "line_items": {
3     "type": "array",
4     "items": {
5       "type": "object",
6       "properties": {
7         "description": { "type": "string" },
8         "quantity": { "type": "number" },
9         "amount": { "type": "number" }
10       }
11     },
12     "description": "List of invoice line items"
13   }
14 }

Object

For nested structures:

1 {
2   "vendor": {
3     "type": "object",
4     "properties": {
5       "name": { 
6         "type": "string",
7         "description": "Company name"
8       },
9       "address": {
10         "type": "string",
11         "description": "Full mailing address"
12       },
13       "tax_id": {
14         "type": "string",
15         "description": "Tax identification number"
16       }
17     },
18     "description": "Vendor/seller information"
19   }
20 }

Complete Example

Here’s a comprehensive invoice extraction schema:

1 {
2   "invoice_number": {
3     "type": "string",
4     "description": "Unique invoice identifier (e.g., INV-2024-001)"
5   },
6   "invoice_date": {
7     "type": "string",
8     "description": "Date the invoice was issued, in YYYY-MM-DD format"
9   },
10   "due_date": {
11     "type": "string",
12     "description": "Payment due date, in YYYY-MM-DD format"
13   },
14   "vendor": {
15     "type": "object",
16     "properties": {
17       "name": {
18         "type": "string",
19         "description": "Vendor company name"
20       },
21       "address": {
22         "type": "string",
23         "description": "Vendor street address, city, state, zip"
24       },
25       "phone": {
26         "type": "string",
27         "description": "Vendor phone number"
28       },
29       "email": {
30         "type": "string",
31         "description": "Vendor email address"
32       }
33     },
34     "description": "Information about the vendor/seller"
35   },
36   "customer": {
37     "type": "object",
38     "properties": {
39       "name": {
40         "type": "string",
41         "description": "Customer/buyer name or company"
42       },
43       "address": {
44         "type": "string",
45         "description": "Billing address"
46       }
47     },
48     "description": "Information about the customer/buyer"
49   },
50   "line_items": {
51     "type": "array",
52     "items": {
53       "type": "object",
54       "properties": {
55         "description": {
56           "type": "string",
57           "description": "Item or service description"
58         },
59         "quantity": {
60           "type": "number",
61           "description": "Number of units"
62         },
63         "unit_price": {
64           "type": "number",
65           "description": "Price per unit in dollars"
66         },
67         "total": {
68           "type": "number",
69           "description": "Line item total (quantity × unit_price)"
70         }
71       }
72     },
73     "description": "Itemized list of products or services"
74   },
75   "subtotal": {
76     "type": "number",
77     "description": "Sum of all line items before tax"
78   },
79   "tax_rate": {
80     "type": "number",
81     "description": "Tax percentage applied (e.g., 8.5 for 8.5%)"
82   },
83   "tax_amount": {
84     "type": "number",
85     "description": "Total tax amount in dollars"
86   },
87   "total_amount": {
88     "type": "number",
89     "description": "Final total including tax"
90   },
91   "payment_terms": {
92     "type": "string",
93     "enum": ["net_15", "net_30", "net_60", "due_on_receipt"],
94     "description": "Payment terms"
95   },
96   "notes": {
97     "type": "string",
98     "description": "Any additional notes or comments on the invoice"
99   }
100 }

Writing Effective Descriptions

Descriptions are crucial for accurate extraction. They guide the AI on:

What to look for
Where it might be found
How to format the value

Be Specific

❌ "Date of the document"

✅ "Invoice issue date, typically at the top of the document, in YYYY-MM-DD format"

Disambiguate Similar Fields

When a document has multiple similar values:

1 {
2   "ship_date": {
3     "description": "Date items were shipped, not the order date or delivery date"
4   },
5   "delivery_date": {
6     "description": "Expected or actual delivery date, not the ship date"
7   }
8 }

Specify Format

1 {
2   "phone": {
3     "description": "Phone number in format (XXX) XXX-XXXX"
4   },
5   "amount": {
6     "description": "Dollar amount as a number without currency symbol"
7   }
8 }

Handle Missing Values

1 {
2   "po_number": {
3     "description": "Purchase order number if present, otherwise null"
4   }
5 }

Optional vs Required Fields

By default, all fields are optional. Use the required property for mandatory fields:

1 {
2   "type": "object",
3   "required": ["invoice_number", "total_amount"],
4   "properties": {
5     "invoice_number": { ... },
6     "total_amount": { ... },
7     "po_number": { ... }
8   }
9 }

Best Practices

Start Simple

Begin with 5-10 core fields, validate accuracy, then expand

Use Consistent Naming

Use snake_case and descriptive names: total_amount not amt

Group Related Fields

Use nested objects for related data (vendor info, customer info, etc.)

Test Edge Cases

Test with documents missing optional fields to ensure graceful handling