Skip to content

Commit c2b8e93

Browse files
committed
docs and notebooks update for json schema
1 parent 0809e00 commit c2b8e93

File tree

6 files changed

+329
-222
lines changed

6 files changed

+329
-222
lines changed

docs/assessment.md

Lines changed: 46 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -318,9 +318,9 @@ For basic single-value extractions like dates, amounts, or names:
318318

319319
**Configuration:**
320320
```yaml
321-
attributes:
322-
- name: "StatementDate"
323-
attributeType: "simple"
321+
properties:
322+
StatementDate:
323+
type: string
324324
description: "The date of the bank statement"
325325
```
326326

@@ -360,14 +360,16 @@ For nested object structures with multiple related fields:
360360

361361
**Configuration:**
362362
```yaml
363-
attributes:
364-
- name: "AccountDetails"
365-
attributeType: "group"
363+
properties:
364+
AccountDetails:
365+
type: object
366366
description: "Bank account information"
367-
groupAttributes:
368-
- name: "AccountNumber"
367+
properties:
368+
AccountNumber:
369+
type: string
369370
description: "The account number"
370-
- name: "RoutingNumber"
371+
RoutingNumber:
372+
type: string
371373
description: "The bank routing number"
372374
```
373375

@@ -413,18 +415,22 @@ For arrays of items, such as transactions in a bank statement:
413415

414416
**Configuration:**
415417
```yaml
416-
attributes:
417-
- name: "Transactions"
418-
attributeType: "list"
418+
properties:
419+
Transactions:
420+
type: array
419421
description: "List of all transactions on the statement"
420-
listItemTemplate:
421-
itemDescription: "Individual transaction entry"
422-
itemAttributes:
423-
- name: "Date"
422+
x-aws-idp-list-item-description: "Individual transaction entry"
423+
items:
424+
type: object
425+
properties:
426+
Date:
427+
type: string
424428
description: "Transaction date"
425-
- name: "Description"
429+
Description:
430+
type: string
426431
description: "Transaction description"
427-
- name: "Amount"
432+
Amount:
433+
type: string
428434
description: "Transaction amount"
429435
```
430436

@@ -979,27 +985,34 @@ attributes:
979985
Processes complex nested structures as single units:
980986
```yaml
981987
# Each group becomes one focused task
982-
attributes:
983-
- name: "AccountDetails"
984-
attributeType: "group"
985-
groupAttributes:
986-
- name: "AccountNumber"
987-
- name: "RoutingNumber"
988-
- name: "AccountType"
988+
properties:
989+
AccountDetails:
990+
type: object
991+
properties:
992+
AccountNumber:
993+
type: string
994+
RoutingNumber:
995+
type: string
996+
AccountType:
997+
type: string
989998
```
990999

9911000
#### List Item Tasks
9921001
Assesses each list item individually for maximum accuracy:
9931002
```yaml
9941003
# 100 transactions = 100 individual assessment tasks
995-
attributes:
996-
- name: "Transactions"
997-
attributeType: "list"
998-
listItemTemplate:
999-
itemAttributes:
1000-
- name: "Date"
1001-
- name: "Description"
1002-
- name: "Amount"
1004+
properties:
1005+
Transactions:
1006+
type: array
1007+
items:
1008+
type: object
1009+
properties:
1010+
Date:
1011+
type: string
1012+
Description:
1013+
type: string
1014+
Amount:
1015+
type: string
10031016
```
10041017

10051018
### Performance Tuning

docs/classification.md

Lines changed: 26 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -602,13 +602,16 @@ When you want all pages of a document to be classified as the same class, you ca
602602

603603
```yaml
604604
classes:
605-
- name: Payslip
605+
- $schema: "https://json-schema.org/draft/2020-12/schema"
606+
$id: Payslip
607+
x-aws-idp-document-type: Payslip
608+
type: object
606609
description: "Employee wage statement showing earnings and deductions"
607610
document_name_regex: "(?i).*(payslip|paystub|salary|wage).*"
608-
attributes:
609-
- name: EmployeeName
611+
properties:
612+
EmployeeName:
613+
type: string
610614
description: "Name of the employee"
611-
attributeType: simple
612615
```
613616

614617
**Benefits:**
@@ -632,24 +635,33 @@ classification:
632635
classificationMethod: multimodalPageLevelClassification
633636
634637
classes:
635-
- name: Invoice
638+
- $schema: "https://json-schema.org/draft/2020-12/schema"
639+
$id: Invoice
640+
x-aws-idp-document-type: Invoice
641+
type: object
636642
description: "Business invoice document"
637643
document_page_content_regex: "(?i)(invoice\\s+number|bill\\s+to|amount\\s+due)"
638-
attributes:
639-
- name: InvoiceNumber
644+
properties:
645+
InvoiceNumber:
646+
type: string
640647
description: "Invoice number"
641-
attributeType: simple
642-
- name: Payslip
648+
- $schema: "https://json-schema.org/draft/2020-12/schema"
649+
$id: Payslip
650+
x-aws-idp-document-type: Payslip
651+
type: object
643652
description: "Employee wage statement"
644653
document_page_content_regex: "(?i)(gross\\s+pay|net\\s+pay|employee\\s+id)"
645-
attributes:
646-
- name: EmployeeName
654+
properties:
655+
EmployeeName:
656+
type: string
647657
description: "Employee name"
648-
attributeType: simple
649-
- name: Other
658+
- $schema: "https://json-schema.org/draft/2020-12/schema"
659+
$id: Other
660+
x-aws-idp-document-type: Other
661+
type: object
650662
description: "Documents that don't match specific patterns"
651663
# No regex - will always use LLM
652-
attributes: []
664+
properties: {}
653665
```
654666

655667
**Benefits:**

docs/evaluation.md

Lines changed: 58 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -121,19 +121,24 @@ Basic single-value extractions evaluated as individual fields:
121121

122122
```yaml
123123
classes:
124-
- name: invoice
125-
attributes:
126-
- name: invoice_number
124+
- $schema: "https://json-schema.org/draft/2020-12/schema"
125+
$id: invoice
126+
x-aws-idp-document-type: invoice
127+
type: object
128+
properties:
129+
invoice_number:
130+
type: string
127131
description: The unique identifier for the invoice
128-
attributeType: simple # or omit for default
129-
evaluation_method: EXACT # Use exact string matching
130-
- name: amount_due
132+
x-aws-idp-evaluation-method: EXACT # Use exact string matching
133+
amount_due:
134+
type: string
131135
description: The total amount to be paid
132-
evaluation_method: NUMERIC_EXACT # Use numeric comparison
133-
- name: vendor_name
136+
x-aws-idp-evaluation-method: NUMERIC_EXACT # Use numeric comparison
137+
vendor_name:
138+
type: string
134139
description: Name of the vendor
135-
evaluation_method: FUZZY # Use fuzzy matching
136-
evaluation_threshold: 0.8 # Minimum similarity threshold
140+
x-aws-idp-evaluation-method: FUZZY # Use fuzzy matching
141+
x-aws-idp-confidence-threshold: 0.8 # Minimum similarity threshold
137142
```
138143
139144
### Group Attributes
@@ -142,30 +147,38 @@ Nested object structures where each sub-attribute is evaluated individually:
142147
143148
```yaml
144149
classes:
145-
- name: "Bank Statement"
146-
attributes:
147-
- name: "Account Holder Address"
150+
- $schema: "https://json-schema.org/draft/2020-12/schema"
151+
$id: BankStatement
152+
x-aws-idp-document-type: "Bank Statement"
153+
type: object
154+
properties:
155+
Account Holder Address:
156+
type: object
148157
description: "Complete address information for the account holder"
149-
attributeType: group
150-
groupAttributes:
151-
- name: "Street Number"
158+
properties:
159+
Street Number:
160+
type: string
152161
description: "House or building number"
153-
evaluation_method: FUZZY
154-
evaluation_threshold: 0.9
155-
- name: "Street Name"
162+
x-aws-idp-evaluation-method: FUZZY
163+
x-aws-idp-confidence-threshold: 0.9
164+
Street Name:
165+
type: string
156166
description: "Name of the street"
157-
evaluation_method: FUZZY
158-
evaluation_threshold: 0.8
159-
- name: "City"
167+
x-aws-idp-evaluation-method: FUZZY
168+
x-aws-idp-confidence-threshold: 0.8
169+
City:
170+
type: string
160171
description: "City name"
161-
evaluation_method: FUZZY
162-
evaluation_threshold: 0.9
163-
- name: "State"
172+
x-aws-idp-evaluation-method: FUZZY
173+
x-aws-idp-confidence-threshold: 0.9
174+
State:
175+
type: string
164176
description: "State abbreviation (e.g., CA, NY)"
165-
evaluation_method: EXACT
166-
- name: "ZIP Code"
177+
x-aws-idp-evaluation-method: EXACT
178+
ZIP Code:
179+
type: string
167180
description: "5 or 9 digit postal code"
168-
evaluation_method: EXACT
181+
x-aws-idp-evaluation-method: EXACT
169182
```
170183
171184
### List Attributes
@@ -174,19 +187,25 @@ Arrays of items where each item's attributes are evaluated individually across a
174187
175188
```yaml
176189
classes:
177-
- name: "Bank Statement"
178-
attributes:
179-
- name: "Transactions"
190+
- $schema: "https://json-schema.org/draft/2020-12/schema"
191+
$id: BankStatement
192+
x-aws-idp-document-type: "Bank Statement"
193+
type: object
194+
properties:
195+
Transactions:
196+
type: array
180197
description: "List of all transactions in the statement period"
181-
attributeType: list
182-
listItemTemplate:
183-
itemDescription: "Individual transaction record"
184-
itemAttributes:
185-
- name: "Date"
198+
x-aws-idp-list-item-description: "Individual transaction record"
199+
items:
200+
type: object
201+
properties:
202+
Date:
203+
type: string
186204
description: "Transaction date (MM/DD/YYYY)"
187-
evaluation_method: FUZZY
188-
evaluation_threshold: 0.9
189-
- name: "Description"
205+
x-aws-idp-evaluation-method: FUZZY
206+
x-aws-idp-confidence-threshold: 0.9
207+
Description:
208+
type: string
190209
description: "Transaction description or merchant name"
191210
evaluation_method: SEMANTIC
192211
evaluation_threshold: 0.7

0 commit comments

Comments
 (0)