Skip to content

Commit 1715289

Browse files
committed
Merge branch 'upgrade_multimodal_classification' into 'develop'
refactor: Consolidate multimodal classification methods with BIO-like sequence segmentation based new classification approach See merge request genaiic-reusable-assets/engagement-artifacts/genaiic-idp-accelerator!252
2 parents 9919273 + 4a1520c commit 1715289

File tree

9 files changed

+201
-1477
lines changed

9 files changed

+201
-1477
lines changed

CHANGELOG.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,17 @@ SPDX-License-Identifier: MIT-0
55

66
## [Unreleased]
77

8+
## [0.3.12]
89

910
### Added
1011

12+
- **Refactored Document Classification Service for Enhanced Boundary Detection**
13+
- Consolidated `multimodalPageLevelClassification` and the experimental `multimodalPageBoundaryClassification` (from v0.3.11) into a single enhanced `multimodalPageLevelClassification` method
14+
- Implemented BIO-like sequence segmentation with document boundary indicators: "start" (new document) and "continue" (same document)
15+
- Automatically segments multi-document packets, even when they contain multiple documents of the same type
16+
- **Benefits**: Simplified codebase with single multimodal classification method, improved handling of complex document packets, maintains backward compatibility
17+
- **No Breaking Changes**: Existing configurations work unchanged, no configuration updates required
18+
1119
- **Enhanced A2I Template and Workflow Management**
1220
- Enhanced A2I template with improved user interface and clearer instructions for reviewers
1321
- Added comprehensive instructions for reviewers in A2I template to guide the review process

config_library/pattern-2/lending-package-sample/config.yaml

Lines changed: 26 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,9 @@
1+
# SPDX-License-Identifier: MIT-0
2+
3+
notes: Boundary-aware classification example for pattern-2
4+
5+
6+
17
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
28
# SPDX-License-Identifier: MIT-0
39

@@ -914,15 +920,26 @@ classes:
914920
evaluation_method: LLM
915921
attributeType: group
916922
classification:
923+
classificationMethod: multimodalPageLevelClassification
917924
image:
918925
target_height: ''
919926
target_width: ''
927+
model: us.amazon.nova-pro-v1:0
928+
temperature: '0.0'
920929
top_p: '0.1'
921930
max_tokens: '4096'
922931
top_k: '5'
932+
system_prompt: >-
933+
You are a multimodal document classification expert that analyzes business documents using both visual layout and textual content. Your task is to classify single-page documents into predefined categories based on their structural patterns, visual features, and text content. Your output must be valid JSON according to the requested format.
934+
935+
<variables>
936+
<document-ocr-data>: OCR-extracted text content from the document page that provides textual information for classification
937+
<document-image>: Visual representation of the document page that provides layout, formatting, and visual structure information
938+
<document-types>: List of valid document types with their descriptions that the document must be classified into
939+
</variables>
923940
task_prompt: >-
924941
<task-description>
925-
Analyze the provided document using both its visual layout and textual content to determine its document type. You must classify it into exactly one of the predefined categories.
942+
Analyze the provided document using both its visual layout and textual content to determine its document type and whether this page begins a new document or continues the previous one.
926943
</task-description>
927944
928945
<document-types>
@@ -934,24 +951,16 @@ classification:
934951
1. Examine the visual layout: headers, logos, formatting, structure, and visual organization
935952
2. Analyze the textual content: key phrases, terminology, purpose, and information type
936953
3. Identify distinctive features that match the document type descriptions
937-
4. Consider both visual and textual evidence together to determine the best match
938-
5. CRITICAL: Only use document types explicitly listed in the <document-types> section
954+
4. Decide if this page starts a new document (output "start") or continues the previous document (output "continue")
955+
5. Consider both visual and textual evidence together to determine the best match
956+
6. CRITICAL: Only use document types explicitly listed in the <document-types> section
939957
</classification-instructions>
940958
941-
<reasoning-guidelines>
942-
When determining the document type:
943-
- First identify the document's primary purpose and function
944-
- Note specific visual elements (letterhead, forms, tables, signatures)
945-
- Identify key textual indicators (terminology, phrases, structure)
946-
- Consider the document's intended audience and use case
947-
- Provide specific evidence from both visual and textual analysis
948-
</reasoning-guidelines>
949-
950959
<output-format>
951-
Return your classification as valid JSON following this exact structure:
952960
{
953961
"classification_reason": "Detailed reasoning including specific visual and textual evidence that led to this classification",
954-
"class": "exact_document_type_from_list"
962+
"class": "exact_document_type_from_list",
963+
"document_boundary": "start or continue"
955964
}
956965
</output-format>
957966
@@ -968,22 +977,10 @@ classification:
968977
<final-instructions>
969978
Analyze the document above by:
970979
1. Applying the <classification-instructions> to examine both visual and textual features
971-
2. Following the <reasoning-guidelines> to build your classification rationale
972-
3. Selecting ONLY from document types in <document-types>
973-
4. Providing clear reasoning with specific evidence before the classification
974-
5. Outputting in the exact JSON format specified in <output-format>
980+
2. Selecting ONLY from document types in <document-types>
981+
3. Providing clear reasoning with specific evidence
982+
4. Outputting in the exact JSON format specified in <output-format>
975983
</final-instructions>
976-
temperature: '0.0'
977-
model: us.amazon.nova-pro-v1:0
978-
system_prompt: >-
979-
You are a multimodal document classification expert that analyzes business documents using both visual layout and textual content. Your task is to classify single-page documents into predefined categories based on their structural patterns, visual features, and text content. Your output must be valid JSON according to the requested format.
980-
981-
<variables>
982-
DOCUMENT_TEXT: OCR-extracted text content from the document page that provides textual information for classification
983-
DOCUMENT_IMAGE: Visual representation of the document page that provides layout, formatting, and visual structure information
984-
CLASS_NAMES_AND_DESCRIPTIONS: List of valid document types with their descriptions that the document must be classified into
985-
</variables>
986-
classificationMethod: multimodalPageLevelClassification
987984
extraction:
988985
image:
989986
target_width: ''

0 commit comments

Comments
 (0)