Schemas

Define the structure of your extracted data

Overview

Schemas define exactly what data to extract from your documents and in what format. A well-defined schema is the key to accurate, consistent extractions.

Automat schemas follow JSON Schema conventions with some AI-specific extensions for better extraction guidance.

Basic Structure

1{
2 "field_name": {
3 "type": "string",
4 "description": "What this field represents"
5 }
6}

Every field should include:

  • type - The data type (string, number, boolean, array, object)
  • description - A clear explanation that guides the AI

Field Types

String

For text values:

1{
2 "customer_name": {
3 "type": "string",
4 "description": "Full name of the customer"
5 }
6}

Number

For numeric values (integers and decimals):

1{
2 "total_amount": {
3 "type": "number",
4 "description": "Total invoice amount in dollars"
5 }
6}

Boolean

For true/false values:

1{
2 "is_paid": {
3 "type": "boolean",
4 "description": "Whether the invoice has been paid"
5 }
6}

Enum

For predefined set of values:

1{
2 "status": {
3 "type": "string",
4 "enum": ["pending", "approved", "rejected"],
5 "description": "Current approval status"
6 }
7}

Array

For lists of values:

1{
2 "line_items": {
3 "type": "array",
4 "items": {
5 "type": "object",
6 "properties": {
7 "description": { "type": "string" },
8 "quantity": { "type": "number" },
9 "amount": { "type": "number" }
10 }
11 },
12 "description": "List of invoice line items"
13 }
14}

Object

For nested structures:

1{
2 "vendor": {
3 "type": "object",
4 "properties": {
5 "name": {
6 "type": "string",
7 "description": "Company name"
8 },
9 "address": {
10 "type": "string",
11 "description": "Full mailing address"
12 },
13 "tax_id": {
14 "type": "string",
15 "description": "Tax identification number"
16 }
17 },
18 "description": "Vendor/seller information"
19 }
20}

Complete Example

Here’s a comprehensive invoice extraction schema:

1{
2 "invoice_number": {
3 "type": "string",
4 "description": "Unique invoice identifier (e.g., INV-2024-001)"
5 },
6 "invoice_date": {
7 "type": "string",
8 "description": "Date the invoice was issued, in YYYY-MM-DD format"
9 },
10 "due_date": {
11 "type": "string",
12 "description": "Payment due date, in YYYY-MM-DD format"
13 },
14 "vendor": {
15 "type": "object",
16 "properties": {
17 "name": {
18 "type": "string",
19 "description": "Vendor company name"
20 },
21 "address": {
22 "type": "string",
23 "description": "Vendor street address, city, state, zip"
24 },
25 "phone": {
26 "type": "string",
27 "description": "Vendor phone number"
28 },
29 "email": {
30 "type": "string",
31 "description": "Vendor email address"
32 }
33 },
34 "description": "Information about the vendor/seller"
35 },
36 "customer": {
37 "type": "object",
38 "properties": {
39 "name": {
40 "type": "string",
41 "description": "Customer/buyer name or company"
42 },
43 "address": {
44 "type": "string",
45 "description": "Billing address"
46 }
47 },
48 "description": "Information about the customer/buyer"
49 },
50 "line_items": {
51 "type": "array",
52 "items": {
53 "type": "object",
54 "properties": {
55 "description": {
56 "type": "string",
57 "description": "Item or service description"
58 },
59 "quantity": {
60 "type": "number",
61 "description": "Number of units"
62 },
63 "unit_price": {
64 "type": "number",
65 "description": "Price per unit in dollars"
66 },
67 "total": {
68 "type": "number",
69 "description": "Line item total (quantity × unit_price)"
70 }
71 }
72 },
73 "description": "Itemized list of products or services"
74 },
75 "subtotal": {
76 "type": "number",
77 "description": "Sum of all line items before tax"
78 },
79 "tax_rate": {
80 "type": "number",
81 "description": "Tax percentage applied (e.g., 8.5 for 8.5%)"
82 },
83 "tax_amount": {
84 "type": "number",
85 "description": "Total tax amount in dollars"
86 },
87 "total_amount": {
88 "type": "number",
89 "description": "Final total including tax"
90 },
91 "payment_terms": {
92 "type": "string",
93 "enum": ["net_15", "net_30", "net_60", "due_on_receipt"],
94 "description": "Payment terms"
95 },
96 "notes": {
97 "type": "string",
98 "description": "Any additional notes or comments on the invoice"
99 }
100}

Writing Effective Descriptions

Descriptions are crucial for accurate extraction. They guide the AI on:

  • What to look for
  • Where it might be found
  • How to format the value

"Date of the document"

"Invoice issue date, typically at the top of the document, in YYYY-MM-DD format"

When a document has multiple similar values:

1{
2 "ship_date": {
3 "description": "Date items were shipped, not the order date or delivery date"
4 },
5 "delivery_date": {
6 "description": "Expected or actual delivery date, not the ship date"
7 }
8}
1{
2 "phone": {
3 "description": "Phone number in format (XXX) XXX-XXXX"
4 },
5 "amount": {
6 "description": "Dollar amount as a number without currency symbol"
7 }
8}
1{
2 "po_number": {
3 "description": "Purchase order number if present, otherwise null"
4 }
5}

Optional vs Required Fields

By default, all fields are optional. Use the required property for mandatory fields:

1{
2 "type": "object",
3 "required": ["invoice_number", "total_amount"],
4 "properties": {
5 "invoice_number": { ... },
6 "total_amount": { ... },
7 "po_number": { ... }
8 }
9}

Best Practices

Start Simple

Begin with 5-10 core fields, validate accuracy, then expand

Use Consistent Naming

Use snake_case and descriptive names: total_amount not amt

Group Related Fields

Use nested objects for related data (vendor info, customer info, etc.)

Test Edge Cases

Test with documents missing optional fields to ensure graceful handling