For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DashboardSign Up
GuidesAPI Reference
  • Getting Started
    • Introduction
    • Authentication
    • Quickstart
  • Concepts
    • Extractors
    • Schemas
LogoLogo
DashboardSign Up
On this page
  • Overview
  • Basic Structure
  • Field Types
  • String
  • Number
  • Boolean
  • Enum
  • Array
  • Object
  • Complete Example
  • Writing Effective Descriptions
  • Optional vs Required Fields
  • Best Practices
Concepts

Schemas

Define the structure of your extracted data
Was this page helpful?
Previous
Built with

Overview

Schemas define exactly what data to extract from your documents and in what format. A well-defined schema is the key to accurate, consistent extractions.

Automat schemas follow JSON Schema conventions with some AI-specific extensions for better extraction guidance.

Basic Structure

1{
2 "field_name": {
3 "type": "string",
4 "description": "What this field represents"
5 }
6}

Every field should include:

  • type - The data type (string, number, boolean, array, object)
  • description - A clear explanation that guides the AI

Field Types

String

For text values:

1{
2 "customer_name": {
3 "type": "string",
4 "description": "Full name of the customer"
5 }
6}

Number

For numeric values (integers and decimals):

1{
2 "total_amount": {
3 "type": "number",
4 "description": "Total invoice amount in dollars"
5 }
6}

Boolean

For true/false values:

1{
2 "is_paid": {
3 "type": "boolean",
4 "description": "Whether the invoice has been paid"
5 }
6}

Enum

For predefined set of values:

1{
2 "status": {
3 "type": "string",
4 "enum": ["pending", "approved", "rejected"],
5 "description": "Current approval status"
6 }
7}

Array

For lists of values:

1{
2 "line_items": {
3 "type": "array",
4 "items": {
5 "type": "object",
6 "properties": {
7 "description": { "type": "string" },
8 "quantity": { "type": "number" },
9 "amount": { "type": "number" }
10 }
11 },
12 "description": "List of invoice line items"
13 }
14}

Object

For nested structures:

1{
2 "vendor": {
3 "type": "object",
4 "properties": {
5 "name": {
6 "type": "string",
7 "description": "Company name"
8 },
9 "address": {
10 "type": "string",
11 "description": "Full mailing address"
12 },
13 "tax_id": {
14 "type": "string",
15 "description": "Tax identification number"
16 }
17 },
18 "description": "Vendor/seller information"
19 }
20}

Complete Example

Here’s a comprehensive invoice extraction schema:

1{
2 "invoice_number": {
3 "type": "string",
4 "description": "Unique invoice identifier (e.g., INV-2024-001)"
5 },
6 "invoice_date": {
7 "type": "string",
8 "description": "Date the invoice was issued, in YYYY-MM-DD format"
9 },
10 "due_date": {
11 "type": "string",
12 "description": "Payment due date, in YYYY-MM-DD format"
13 },
14 "vendor": {
15 "type": "object",
16 "properties": {
17 "name": {
18 "type": "string",
19 "description": "Vendor company name"
20 },
21 "address": {
22 "type": "string",
23 "description": "Vendor street address, city, state, zip"
24 },
25 "phone": {
26 "type": "string",
27 "description": "Vendor phone number"
28 },
29 "email": {
30 "type": "string",
31 "description": "Vendor email address"
32 }
33 },
34 "description": "Information about the vendor/seller"
35 },
36 "customer": {
37 "type": "object",
38 "properties": {
39 "name": {
40 "type": "string",
41 "description": "Customer/buyer name or company"
42 },
43 "address": {
44 "type": "string",
45 "description": "Billing address"
46 }
47 },
48 "description": "Information about the customer/buyer"
49 },
50 "line_items": {
51 "type": "array",
52 "items": {
53 "type": "object",
54 "properties": {
55 "description": {
56 "type": "string",
57 "description": "Item or service description"
58 },
59 "quantity": {
60 "type": "number",
61 "description": "Number of units"
62 },
63 "unit_price": {
64 "type": "number",
65 "description": "Price per unit in dollars"
66 },
67 "total": {
68 "type": "number",
69 "description": "Line item total (quantity × unit_price)"
70 }
71 }
72 },
73 "description": "Itemized list of products or services"
74 },
75 "subtotal": {
76 "type": "number",
77 "description": "Sum of all line items before tax"
78 },
79 "tax_rate": {
80 "type": "number",
81 "description": "Tax percentage applied (e.g., 8.5 for 8.5%)"
82 },
83 "tax_amount": {
84 "type": "number",
85 "description": "Total tax amount in dollars"
86 },
87 "total_amount": {
88 "type": "number",
89 "description": "Final total including tax"
90 },
91 "payment_terms": {
92 "type": "string",
93 "enum": ["net_15", "net_30", "net_60", "due_on_receipt"],
94 "description": "Payment terms"
95 },
96 "notes": {
97 "type": "string",
98 "description": "Any additional notes or comments on the invoice"
99 }
100}

Writing Effective Descriptions

Descriptions are crucial for accurate extraction. They guide the AI on:

  • What to look for
  • Where it might be found
  • How to format the value
Be Specific

❌ "Date of the document"

✅ "Invoice issue date, typically at the top of the document, in YYYY-MM-DD format"

Disambiguate Similar Fields

When a document has multiple similar values:

1{
2 "ship_date": {
3 "description": "Date items were shipped, not the order date or delivery date"
4 },
5 "delivery_date": {
6 "description": "Expected or actual delivery date, not the ship date"
7 }
8}
Specify Format
1{
2 "phone": {
3 "description": "Phone number in format (XXX) XXX-XXXX"
4 },
5 "amount": {
6 "description": "Dollar amount as a number without currency symbol"
7 }
8}
Handle Missing Values
1{
2 "po_number": {
3 "description": "Purchase order number if present, otherwise null"
4 }
5}

Optional vs Required Fields

By default, all fields are optional. Use the required property for mandatory fields:

1{
2 "type": "object",
3 "required": ["invoice_number", "total_amount"],
4 "properties": {
5 "invoice_number": { ... },
6 "total_amount": { ... },
7 "po_number": { ... }
8 }
9}

Best Practices

Start Simple

Begin with 5-10 core fields, validate accuracy, then expand

Use Consistent Naming

Use snake_case and descriptive names: total_amount not amt

Group Related Fields

Use nested objects for related data (vendor info, customer info, etc.)

Test Edge Cases

Test with documents missing optional fields to ensure graceful handling