You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Merge branch 'upgrade_multimodal_classification' into 'develop'
refactor: Consolidate multimodal classification methods with BIO-like sequence segmentation based new classification approach
See merge request genaiic-reusable-assets/engagement-artifacts/genaiic-idp-accelerator!252
Copy file name to clipboardExpand all lines: CHANGELOG.md
+8Lines changed: 8 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,9 +5,17 @@ SPDX-License-Identifier: MIT-0
5
5
6
6
## [Unreleased]
7
7
8
+
## [0.3.12]
8
9
9
10
### Added
10
11
12
+
-**Refactored Document Classification Service for Enhanced Boundary Detection**
13
+
- Consolidated `multimodalPageLevelClassification` and the experimental `multimodalPageBoundaryClassification` (from v0.3.11) into a single enhanced `multimodalPageLevelClassification` method
14
+
- Implemented BIO-like sequence segmentation with document boundary indicators: "start" (new document) and "continue" (same document)
15
+
- Automatically segments multi-document packets, even when they contain multiple documents of the same type
16
+
-**Benefits**: Simplified codebase with single multimodal classification method, improved handling of complex document packets, maintains backward compatibility
17
+
-**No Breaking Changes**: Existing configurations work unchanged, no configuration updates required
18
+
11
19
-**Enhanced A2I Template and Workflow Management**
12
20
- Enhanced A2I template with improved user interface and clearer instructions for reviewers
13
21
- Added comprehensive instructions for reviewers in A2I template to guide the review process
You are a multimodal document classification expert that analyzes business documents using both visual layout and textual content. Your task is to classify single-page documents into predefined categories based on their structural patterns, visual features, and text content. Your output must be valid JSON according to the requested format.
934
+
935
+
<variables>
936
+
<document-ocr-data>: OCR-extracted text content from the document page that provides textual information for classification
937
+
<document-image>: Visual representation of the document page that provides layout, formatting, and visual structure information
938
+
<document-types>: List of valid document types with their descriptions that the document must be classified into
939
+
</variables>
923
940
task_prompt: >-
924
941
<task-description>
925
-
Analyze the provided document using both its visual layout and textual content to determine its document type. You must classify it into exactly one of the predefined categories.
942
+
Analyze the provided document using both its visual layout and textual content to determine its document type and whether this page begins a new document or continues the previous one.
926
943
</task-description>
927
944
928
945
<document-types>
@@ -934,24 +951,16 @@ classification:
934
951
1. Examine the visual layout: headers, logos, formatting, structure, and visual organization
935
952
2. Analyze the textual content: key phrases, terminology, purpose, and information type
936
953
3. Identify distinctive features that match the document type descriptions
937
-
4. Consider both visual and textual evidence together to determine the best match
938
-
5. CRITICAL: Only use document types explicitly listed in the <document-types> section
954
+
4. Decide if this page starts a new document (output "start") or continues the previous document (output "continue")
955
+
5. Consider both visual and textual evidence together to determine the best match
956
+
6. CRITICAL: Only use document types explicitly listed in the <document-types> section
939
957
</classification-instructions>
940
958
941
-
<reasoning-guidelines>
942
-
When determining the document type:
943
-
- First identify the document's primary purpose and function
944
-
- Note specific visual elements (letterhead, forms, tables, signatures)
- Consider the document's intended audience and use case
947
-
- Provide specific evidence from both visual and textual analysis
948
-
</reasoning-guidelines>
949
-
950
959
<output-format>
951
-
Return your classification as valid JSON following this exact structure:
952
960
{
953
961
"classification_reason": "Detailed reasoning including specific visual and textual evidence that led to this classification",
954
-
"class": "exact_document_type_from_list"
962
+
"class": "exact_document_type_from_list",
963
+
"document_boundary": "start or continue"
955
964
}
956
965
</output-format>
957
966
@@ -968,22 +977,10 @@ classification:
968
977
<final-instructions>
969
978
Analyze the document above by:
970
979
1. Applying the <classification-instructions> to examine both visual and textual features
971
-
2. Following the <reasoning-guidelines> to build your classification rationale
972
-
3. Selecting ONLY from document types in <document-types>
973
-
4. Providing clear reasoning with specific evidence before the classification
974
-
5. Outputting in the exact JSON format specified in <output-format>
980
+
2. Selecting ONLY from document types in <document-types>
981
+
3. Providing clear reasoning with specific evidence
982
+
4. Outputting in the exact JSON format specified in <output-format>
975
983
</final-instructions>
976
-
temperature: '0.0'
977
-
model: us.amazon.nova-pro-v1:0
978
-
system_prompt: >-
979
-
You are a multimodal document classification expert that analyzes business documents using both visual layout and textual content. Your task is to classify single-page documents into predefined categories based on their structural patterns, visual features, and text content. Your output must be valid JSON according to the requested format.
980
-
981
-
<variables>
982
-
DOCUMENT_TEXT: OCR-extracted text content from the document page that provides textual information for classification
983
-
DOCUMENT_IMAGE: Visual representation of the document page that provides layout, formatting, and visual structure information
984
-
CLASS_NAMES_AND_DESCRIPTIONS: List of valid document types with their descriptions that the document must be classified into
0 commit comments