Skip to content

Commit 0723c2b

Browse files
author
Taniya Mathur
committed
Move MaxPagesForClassification from template parameters to config
1 parent 2f93523 commit 0723c2b

File tree

10 files changed

+58
-60
lines changed

10 files changed

+58
-60
lines changed

config_library/pattern-2/bank-statement-sample/config.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,7 @@ classes:
6868
description: List of all transactions in the statement period
6969
attributeType: list
7070
classification:
71+
maxPagesForClassification: "ALL"
7172
image:
7273
target_height: ''
7374
target_width: ''

config_library/pattern-2/lending-package-sample/config.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -914,6 +914,7 @@ classes:
914914
attributeType: group
915915
classification:
916916
classificationMethod: multimodalPageLevelClassification
917+
maxPagesForClassification: "ALL"
917918
image:
918919
target_height: ''
919920
target_width: ''

config_library/pattern-2/rvl-cdip-package-sample-with-few-shot-examples/config.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -647,6 +647,7 @@ classes:
647647
imagePath: config_library/pattern-2/few_shot_example_with_multimodal_page_classification/example-images/bank-statement-pages/
648648

649649
classification:
650+
maxPagesForClassification: "ALL"
650651
image:
651652
target_height: ''
652653
target_width: ''

config_library/pattern-2/rvl-cdip-package-sample/config.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -307,6 +307,7 @@ classes:
307307
- name: comments
308308
description: Additional notes or remarks about the document. Look for sections labeled 'notes', 'remarks', or 'comments'.
309309
classification:
310+
maxPagesForClassification: "ALL"
310311
image:
311312
target_height: ''
312313
target_width: ''

docs/classification.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -181,6 +181,22 @@ When deciding between Text-Based Holistic Classification and MultiModal Page-Lev
181181
182182
## Customizing Classification in Pattern 2
183183
184+
### Configuration Settings
185+
186+
#### Page Limit Configuration
187+
188+
Control how many pages are used for classification:
189+
190+
```yaml
191+
classification:
192+
maxPagesForClassification: "ALL" # Default: use all pages
193+
# Or: "1", "2", "3", etc. - use only first N pages
194+
```
195+
196+
**Important**: When set to a number (e.g., `"3"`), only the first N pages are classified, but the result is applied to ALL pages in the document. This forces the entire document to be assigned a single class with one section.
197+
198+
### Prompt Components
199+
184200
In Pattern 2, you can customize classification behavior through various prompt components:
185201

186202
### System Prompts

lib/idp_common_pkg/tests/test_max_pages_classification.py

Lines changed: 35 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -8,32 +8,33 @@
88
from idp_common.models import Document, Page
99

1010

11+
@pytest.mark.unit
1112
class TestMaxPagesForClassification:
1213
@pytest.fixture
1314
def mock_config(self):
1415
return {
1516
"classification": {
1617
"maxPagesForClassification": "ALL",
1718
"classificationMethod": "multimodalPageLevelClassification",
19+
"model": "us.amazon.nova-pro-v1:0",
20+
"system_prompt": "Test system prompt",
21+
"task_prompt": "Test task prompt",
1822
}
1923
}
2024

2125
@pytest.fixture
2226
def classification_service(self, mock_config):
23-
with patch(
24-
"idp_common.classification.service.get_config", return_value=mock_config
25-
):
26-
return ClassificationService(backend="bedrock")
27+
return ClassificationService(backend="bedrock", config=mock_config)
2728

2829
@pytest.fixture
2930
def sample_document(self):
3031
doc = Document(id="test-doc")
3132
doc.pages = {
32-
"1": Page(page_id="1", page_number=1),
33-
"2": Page(page_id="2", page_number=2),
34-
"3": Page(page_id="3", page_number=3),
35-
"4": Page(page_id="4", page_number=4),
36-
"5": Page(page_id="5", page_number=5),
33+
"1": Page(page_id="1"),
34+
"2": Page(page_id="2"),
35+
"3": Page(page_id="3"),
36+
"4": Page(page_id="4"),
37+
"5": Page(page_id="5"),
3738
}
3839
return doc
3940

@@ -84,16 +85,16 @@ def test_apply_limited_classification_single_type(self, classification_service):
8485
# Original document with 3 pages
8586
original_doc = Document(id="original")
8687
original_doc.pages = {
87-
"1": Page(page_id="1", page_number=1),
88-
"2": Page(page_id="2", page_number=2),
89-
"3": Page(page_id="3", page_number=3),
88+
"1": Page(page_id="1"),
89+
"2": Page(page_id="2"),
90+
"3": Page(page_id="3"),
9091
}
9192

9293
# Classified document with 2 pages, both classified as "invoice"
9394
classified_doc = Document(id="classified")
9495
classified_doc.pages = {
95-
"1": Page(page_id="1", page_number=1),
96-
"2": Page(page_id="2", page_number=2),
96+
"1": Page(page_id="1"),
97+
"2": Page(page_id="2"),
9798
}
9899
classified_doc.pages["1"].classification = "invoice"
99100
classified_doc.pages["2"].classification = "invoice"
@@ -122,17 +123,17 @@ def test_apply_limited_classification_tie_breaker(self, classification_service):
122123
# Original document with 4 pages
123124
original_doc = Document(id="original")
124125
original_doc.pages = {
125-
"1": Page(page_id="1", page_number=1),
126-
"2": Page(page_id="2", page_number=2),
127-
"3": Page(page_id="3", page_number=3),
128-
"4": Page(page_id="4", page_number=4),
126+
"1": Page(page_id="1"),
127+
"2": Page(page_id="2"),
128+
"3": Page(page_id="3"),
129+
"4": Page(page_id="4"),
129130
}
130131

131132
# Classified document with 2 pages, different classifications
132133
classified_doc = Document(id="classified")
133134
classified_doc.pages = {
134-
"1": Page(page_id="1", page_number=1),
135-
"2": Page(page_id="2", page_number=2),
135+
"1": Page(page_id="1"),
136+
"2": Page(page_id="2"),
136137
}
137138
classified_doc.pages["1"].classification = "payslip"
138139
classified_doc.pages["2"].classification = "drivers_license"
@@ -164,7 +165,7 @@ def test_apply_limited_classification_tie_breaker(self, classification_service):
164165
def test_apply_limited_classification_empty_sections(self, classification_service):
165166
"""Test handling of empty sections"""
166167
original_doc = Document(id="original")
167-
original_doc.pages = {"1": Page(page_id="1", page_number=1)}
168+
original_doc.pages = {"1": Page(page_id="1")}
168169

169170
classified_doc = Document(id="classified")
170171
classified_doc.sections = []
@@ -176,27 +177,31 @@ def test_apply_limited_classification_empty_sections(self, classification_servic
176177
# Should return original document unchanged
177178
assert result == original_doc
178179

179-
@patch("idp_common.classification.service.get_config")
180-
def test_config_integration(self, mock_get_config):
180+
def test_config_integration(self):
181181
"""Test that maxPagesForClassification is read from config"""
182-
mock_get_config.return_value = {
182+
mock_config = {
183183
"classification": {
184184
"maxPagesForClassification": "2",
185185
"classificationMethod": "multimodalPageLevelClassification",
186+
"model": "us.amazon.nova-pro-v1:0",
187+
"system_prompt": "Test system prompt",
188+
"task_prompt": "Test task prompt",
186189
}
187190
}
188191

189-
service = ClassificationService(backend="bedrock")
192+
service = ClassificationService(backend="bedrock", config=mock_config)
190193
assert service.max_pages_for_classification == "2"
191194

192-
@patch("idp_common.classification.service.get_config")
193-
def test_config_default_value(self, mock_get_config):
195+
def test_config_default_value(self):
194196
"""Test default value when maxPagesForClassification not in config"""
195-
mock_get_config.return_value = {
197+
mock_config = {
196198
"classification": {
197-
"classificationMethod": "multimodalPageLevelClassification"
199+
"classificationMethod": "multimodalPageLevelClassification",
200+
"model": "us.amazon.nova-pro-v1:0",
201+
"system_prompt": "Test system prompt",
202+
"task_prompt": "Test task prompt",
198203
}
199204
}
200205

201-
service = ClassificationService(backend="bedrock")
206+
service = ClassificationService(backend="bedrock", config=mock_config)
202207
assert service.max_pages_for_classification == "ALL"

patterns/pattern-2/template.yaml

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -91,10 +91,7 @@ Parameters:
9191
- "false"
9292
Description: "Enable Human In The Loop (A2I) for document review"
9393

94-
MaxPagesForClassification:
95-
Type: String
96-
Default: "ALL"
97-
Description: "Number of pages to use for document classification"
94+
9895

9996
SageMakerA2IReviewPortalURL:
10097
Type: String
@@ -976,7 +973,6 @@ Resources:
976973
ServiceToken: !Ref UpdateConfigurationFunctionArn
977974
Default: !Ref ConfigurationDefaultS3Uri
978975
ConfigLibraryHash: !Ref ConfigLibraryHash
979-
MaxPagesForClassification: !Ref MaxPagesForClassification
980976
CustomClassificationModelARN: !Ref CustomClassificationModelARN
981977
CustomExtractionModelARN: !Ref CustomExtractionModelARN
982978

patterns/pattern-3/template.yaml

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -105,10 +105,7 @@ Parameters:
105105
- "false"
106106
Description: "Enable Human In The Loop (A2I) for document review"
107107

108-
MaxPagesForClassification:
109-
Type: String
110-
Default: "ALL"
111-
Description: "Number of pages to use for document classification"
108+
112109

113110
SageMakerA2IReviewPortalURL:
114111
Type: String
@@ -878,7 +875,6 @@ Resources:
878875
ServiceToken: !Ref UpdateConfigurationFunctionArn
879876
Default: !Ref ConfigurationDefaultS3Uri
880877
ConfigLibraryHash: !Ref ConfigLibraryHash
881-
MaxPagesForClassification: !Ref MaxPagesForClassification
882878

883879

884880
OCRFunction:

src/lambda/update_configuration/index.py

Lines changed: 0 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -157,13 +157,6 @@ def handler(event: Dict[str, Any], context: Any) -> None:
157157
if 'extraction' in resolved_default:
158158
resolved_default['extraction']['model'] = properties['CustomExtractionModelARN']
159159
logger.info(f"Updated extraction model to: {properties['CustomExtractionModelARN']}")
160-
161-
# Add MaxPagesForClassification if provided
162-
if 'MaxPagesForClassification' in properties:
163-
if 'classification' not in resolved_default:
164-
resolved_default['classification'] = {}
165-
resolved_default['classification']['maxPagesForClassification'] = properties['MaxPagesForClassification']
166-
logger.info(f"Updated maxPagesForClassification to: {properties['MaxPagesForClassification']}")
167160

168161
update_configuration('Default', resolved_default)
169162

template.yaml

Lines changed: 1 addition & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -114,17 +114,7 @@ Parameters:
114114
Description: >-
115115
Select the configuration preset for Pattern 1. Each configuration contains pre-tuned settings for specific document processing scenarios - see https://github.com/aws-samples/sample-genai-idp/blob/main/config_library/README.md. Note: This selected configuration will be replaced by the Custom Configuration Path if specified.
116116
117-
MaxPagesForClassification:
118-
Type: String
119-
Default: "ALL"
120-
AllowedValues:
121-
- "ALL"
122-
- "1"
123-
- "2"
124-
- "3"
125-
- "5"
126-
- "10"
127-
Description: "Number of pages to use for document classification. 'ALL' uses all pages, or specify a number (1-10) to limit classification to first N pages and apply result to all pages. Only applies to Pattern 2 and Pattern 3."
117+
128118
129119
# Pattern 2 Parameters
130120

@@ -918,7 +908,6 @@ Resources:
918908
- "s3://${ConfigurationBucket}/config_library/pattern-2/${ConfigPath}/config.yaml"
919909
- ConfigPath: !FindInMap [Pattern2ConfigurationMap, !Ref Pattern2Configuration, ConfigPath]
920910
ConfigLibraryHash: "<CONFIG_LIBRARY_HASH_TOKEN>"
921-
MaxPagesForClassification: !Ref MaxPagesForClassification
922911
EnableHITL: !Ref EnableHITL
923912
SageMakerA2IReviewPortalURL: !If
924913
- IsHITLEnabled
@@ -960,7 +949,6 @@ Resources:
960949
- "s3://${ConfigurationBucket}/config_library/pattern-3/${ConfigPath}/config.yaml"
961950
- ConfigPath: !FindInMap [Pattern3ConfigurationMap, !Ref Pattern3Configuration, ConfigPath]
962951
ConfigLibraryHash: "<CONFIG_LIBRARY_HASH_TOKEN>"
963-
MaxPagesForClassification: !Ref MaxPagesForClassification
964952
EnableHITL: !Ref EnableHITL
965953
SageMakerA2IReviewPortalURL: !If
966954
- IsHITLEnabled

0 commit comments

Comments
 (0)