|
| 1 | +# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. |
| 2 | +# SPDX-License-Identifier: MIT-0 |
| 3 | +notes: Configuration for FCC invoice information extraction (no classification) |
| 4 | +ocr: |
| 5 | + backend: "textract" |
| 6 | + model_id: "us.anthropic.claude-3-7-sonnet-20250219-v1:0" |
| 7 | + system_prompt: "You are an expert OCR system. Extract all text from the provided image accurately, preserving layout where possible." |
| 8 | + task_prompt: "Extract all text from this document image. Preserve the layout, including paragraphs, tables, and formatting." |
| 9 | + features: |
| 10 | + - name: LAYOUT |
| 11 | + - name: TABLES |
| 12 | + image: |
| 13 | + target_width: "" |
| 14 | + target_height: "" |
| 15 | + |
| 16 | +classes: |
| 17 | + - name: FCC-Invoice |
| 18 | + description: >- |
| 19 | + Federal Communications Commission (FCC) political advertising invoice showing broadcast |
| 20 | + time purchases, including line items with descriptions, dates, rates, and totals for |
| 21 | + political advertising campaigns. |
| 22 | + attributes: |
| 23 | + - name: agency |
| 24 | + description: >- |
| 25 | + The advertising agency or media buyer handling the political advertising purchase. |
| 26 | + evaluation_method: EXACT |
| 27 | + attributeType: simple |
| 28 | + |
| 29 | + - name: advertiser |
| 30 | + description: >- |
| 31 | + The political advertiser or campaign purchasing the broadcast time. |
| 32 | + evaluation_method: EXACT |
| 33 | + attributeType: simple |
| 34 | + |
| 35 | + - name: gross_total |
| 36 | + description: >- |
| 37 | + The total gross amount for all line items before any discounts or adjustments. |
| 38 | + evaluation_method: NUMERIC_EXACT |
| 39 | + attributeType: simple |
| 40 | + |
| 41 | + - name: net_amount_due |
| 42 | + description: >- |
| 43 | + The final net amount due after any discounts or adjustments have been applied. |
| 44 | + evaluation_method: NUMERIC_EXACT |
| 45 | + attributeType: simple |
| 46 | + |
| 47 | + - name: line_items |
| 48 | + listItemTemplate: |
| 49 | + itemAttributes: |
| 50 | + - name: description |
| 51 | + description: >- |
| 52 | + The broadcast time slot description, typically showing days of week and time range |
| 53 | + (e.g., "M-F 11a-12p" for Monday through Friday 11am to 12pm). |
| 54 | + evaluation_method: EXACT |
| 55 | + |
| 56 | + - name: days |
| 57 | + description: >- |
| 58 | + The days of the week for this broadcast slot, often in format like "MTWTF--" |
| 59 | + where each position represents a day (Monday, Tuesday, Wednesday, Thursday, Friday, |
| 60 | + Saturday, Sunday) with dashes for non-broadcast days. |
| 61 | + evaluation_method: EXACT |
| 62 | + |
| 63 | + - name: rate |
| 64 | + description: >- |
| 65 | + The rate or cost for this specific broadcast time slot, may include commas |
| 66 | + for thousands separator. |
| 67 | + evaluation_method: NUMERIC_EXACT |
| 68 | + |
| 69 | + - name: start_date |
| 70 | + description: >- |
| 71 | + The start date for this line item's broadcast schedule, typically in MM/DD/YY format. |
| 72 | + evaluation_method: EXACT |
| 73 | + |
| 74 | + - name: end_date |
| 75 | + description: >- |
| 76 | + The end date for this line item's broadcast schedule, typically in MM/DD/YY format. |
| 77 | + evaluation_method: EXACT |
| 78 | + |
| 79 | + itemDescription: >- |
| 80 | + Each item represents a specific broadcast time slot purchase with its schedule, |
| 81 | + rate, and date range. |
| 82 | + |
| 83 | + description: >- |
| 84 | + List of line items detailing each broadcast time slot purchase, including the time |
| 85 | + description, days of week, rate, and date range for the advertising schedule. |
| 86 | + evaluation_method: LLM |
| 87 | + attributeType: list |
| 88 | + |
| 89 | +extraction: |
| 90 | + model_id: "us.anthropic.claude-3-7-sonnet-20250219-v1:0" |
| 91 | + temperature: 0.0 |
| 92 | + top_p: 0.9 |
| 93 | + max_tokens: 4096 |
| 94 | + system_prompt: | |
| 95 | + You are an expert at extracting structured information from FCC political advertising invoices. |
| 96 | + Extract all requested fields accurately, paying special attention to: |
| 97 | + - Line item details including time slots, days, rates, and date ranges |
| 98 | + - Monetary amounts (preserve formatting with commas and decimals) |
| 99 | + - Date formats (typically MM/DD/YY) |
| 100 | + - Agency and advertiser names |
| 101 | + |
| 102 | + For line items, ensure you capture all rows from any tables showing broadcast schedules. |
| 103 | + Days of week are often encoded as 7-character strings where each position represents a day. |
| 104 | + |
| 105 | + task_prompt: | |
| 106 | + Extract the following information from this FCC invoice: |
| 107 | + |
| 108 | + 1. Agency name |
| 109 | + 2. Advertiser name |
| 110 | + 3. Gross total amount |
| 111 | + 4. Net amount due |
| 112 | + 5. All line items with: |
| 113 | + - Description (time slot) |
| 114 | + - Days (day of week encoding) |
| 115 | + - Rate (cost) |
| 116 | + - Start date |
| 117 | + - End date |
| 118 | + |
| 119 | + Return the information in the specified JSON schema format. |
| 120 | +
|
| 121 | +classification: |
| 122 | + enabled: false |
| 123 | + # No classification needed - all documents are FCC invoices |
| 124 | + |
| 125 | +evaluation: |
| 126 | + enabled: true |
| 127 | + methods: |
| 128 | + - EXACT |
| 129 | + - NUMERIC_EXACT |
| 130 | + - LLM |
0 commit comments