Schemas
Overview
Schemas define exactly what data to extract from your documents and in what format. A well-defined schema is the key to accurate, consistent extractions.
Automat schemas follow JSON Schema conventions with some AI-specific extensions for better extraction guidance.
Basic Structure
Every field should include:
- type - The data type (string, number, boolean, array, object)
- description - A clear explanation that guides the AI
Field Types
String
For text values:
Number
For numeric values (integers and decimals):
Boolean
For true/false values:
Enum
For predefined set of values:
Array
For lists of values:
Object
For nested structures:
Complete Example
Here’s a comprehensive invoice extraction schema:
Writing Effective Descriptions
Descriptions are crucial for accurate extraction. They guide the AI on:
- What to look for
- Where it might be found
- How to format the value
Be Specific
❌ "Date of the document"
✅ "Invoice issue date, typically at the top of the document, in YYYY-MM-DD format"
Disambiguate Similar Fields
When a document has multiple similar values:
Specify Format
Handle Missing Values
Optional vs Required Fields
By default, all fields are optional. Use the required property for mandatory fields:
Best Practices
Begin with 5-10 core fields, validate accuracy, then expand
Use snake_case and descriptive names: total_amount not amt
Use nested objects for related data (vendor info, customer info, etc.)
Test with documents missing optional fields to ensure graceful handling