Skip to content

Commit 6624cf8

Browse files
author
Bob Strahan
committed
refactor: rename Document utility methods for clarity and consistency
1 parent 4bb81b8 commit 6624cf8

File tree

18 files changed

+48
-48
lines changed

18 files changed

+48
-48
lines changed

CHANGELOG.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -13,26 +13,26 @@ SPDX-License-Identifier: MIT-0
1313
- Transparent handling of both compressed and uncompressed documents in Lambda functions
1414
- Temporary S3 storage for compressed document state with automatic cleanup via lifecycle policies
1515
- **New Utility Methods**:
16-
- `Document.handle_input_document()`: Automatically detects and decompresses document input from Lambda events
17-
- `Document.prepare_output()`: Automatically compresses large documents for Lambda responses
16+
- `Document.load_document()`: Automatically detects and decompresses document input from Lambda events
17+
- `Document.serialize_document()`: Automatically compresses large documents for Lambda responses
1818
- `Document.compress()` and `Document.decompress()`: Compression/decompression methods
19-
- **Lambda Function Integration**: All Pattern-2 and Pattern-3 Lambda functions updated to use compression utilities
19+
- **Lambda Function Integration**: All relevant Lambda functions updated to use compression utilities
2020
- **Resolves Step Functions Errors**: Eliminates "result with a size exceeding the maximum number of bytes service limit" errors for large multi-page documents
21-
- **Multi-Backend OCR Support**
21+
- **Multi-Backend OCR Support - Pattern 2 and 3**
2222
- Textract Backend (default): Existing AWS Textract functionality
2323
- Bedrock Backend: New LLM-based OCR using Claude/Nova models
2424
- None Backend: Image-only processing without OCR
25-
- **Bedrock OCR Integration**
25+
- **Bedrock OCR Integration - Pattern 2 and 3**
2626
- Customizable system and task prompts for OCR optimization
2727
- Better handling of complex documents, tables, and forms
2828
- Layout preservation capabilities
29-
- **Image Preprocessing**
29+
- **Image Preprocessing - Pattern 2 and 3**
3030
- Adaptive Binarization: Improves OCR accuracy on documents with:
3131
- Uneven lighting or shadows
3232
- Low contrast text
3333
- Background noise or gradients
3434
- Optional feature with configurable enable/disable
35-
- **YAML Parsing Support for LLM Responses**
35+
- **YAML Parsing Support for LLM Responses - Pattern 2 and 3**
3636
- Added comprehensive YAML parsing capabilities to complement existing JSON parsing functionality
3737
- New `extract_yaml_from_text()` function with robust multi-strategy YAML extraction:
3838
- YAML in ```yaml and ```yml code blocks

lib/idp_common_pkg/README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ The Document model includes automatic compression support to handle large docume
5353
- **Automatic Compression**: Documents exceeding configurable size thresholds are automatically compressed to S3
5454
- **Transparent Handling**: Lambda functions seamlessly handle both compressed and uncompressed documents
5555
- **Section Preservation**: Section IDs are preserved in compressed payloads for Step Functions Map operations
56-
- **Utility Methods**: Simplified input/output handling with `handle_input_document()` and `prepare_output()`
56+
- **Utility Methods**: Simplified input/output handling with `load_document()` and `serialize_document()`
5757

5858
### Usage in Lambda Functions
5959

@@ -65,7 +65,7 @@ def lambda_handler(event, context):
6565
working_bucket = os.environ.get('WORKING_BUCKET')
6666

6767
# Handle input - automatically detects and decompresses if needed
68-
document = Document.handle_input_document(
68+
document = Document.load_document(
6969
event_data=event["document"],
7070
working_bucket=working_bucket,
7171
logger=logger
@@ -76,7 +76,7 @@ def lambda_handler(event, context):
7676

7777
# Prepare output - automatically compresses if document is large
7878
response = {
79-
"document": document.prepare_output(
79+
"document": document.serialize_document(
8080
working_bucket=working_bucket,
8181
step_name="classification",
8282
logger=logger

lib/idp_common_pkg/idp_common/README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -181,14 +181,14 @@ document = Document.from_compressed_or_dict(data, working_bucket)
181181

182182
```python
183183
# Handle input - automatically detects and decompresses if needed
184-
document = Document.handle_input_document(
184+
document = Document.load_document(
185185
event_data=event["document"],
186186
working_bucket=working_bucket,
187187
logger=logger
188188
)
189189

190190
# Prepare output - automatically compresses if document is large
191-
response_data = document.prepare_output(
191+
response_data = document.serialize_document(
192192
working_bucket=working_bucket,
193193
step_name="classification",
194194
logger=logger,
@@ -214,7 +214,7 @@ def lambda_handler(event, context):
214214
working_bucket = os.environ.get('WORKING_BUCKET')
215215

216216
# Input handling - works with both compressed and uncompressed documents
217-
document = Document.handle_input_document(
217+
document = Document.load_document(
218218
event["document"], working_bucket, logger
219219
)
220220

@@ -223,7 +223,7 @@ def lambda_handler(event, context):
223223

224224
# Output handling - automatically compresses if needed
225225
return {
226-
"document": document.prepare_output(working_bucket, "step_name", logger)
226+
"document": document.serialize_document(working_bucket, "step_name", logger)
227227
}
228228
```
229229

lib/idp_common_pkg/idp_common/models.py

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -152,15 +152,15 @@ class Document:
152152
- from_compressed_or_dict(): Handle both compressed and regular document data
153153
154154
Utility Methods:
155-
- handle_input_document(): Process document input from Lambda events
156-
- prepare_output(): Prepare document output with automatic compression
155+
- load_document(): Process document input from Lambda events
156+
- serialize_document(): Prepare document output with automatic compression
157157
158158
Usage Examples:
159159
# Handle input in Lambda functions
160-
document = Document.handle_input_document(event_data, working_bucket, logger)
160+
document = Document.load_document(event_data, working_bucket, logger)
161161
162162
# Prepare output with automatic compression
163-
response = {"document": document.prepare_output(working_bucket, "step_name", logger)}
163+
response = {"document": document.serialize_document(working_bucket, "step_name", logger)}
164164
165165
# Manual compression/decompression
166166
compressed_data = document.compress(working_bucket, "processing")
@@ -650,7 +650,7 @@ def from_compressed_or_dict(cls, data, bucket=None):
650650
return cls.from_dict(data)
651651

652652
@classmethod
653-
def handle_input_document(cls, event_data, working_bucket, logger=None):
653+
def load_document(cls, event_data, working_bucket, logger=None):
654654
"""
655655
Utility method to handle document input from Lambda events.
656656
Automatically handles both compressed and uncompressed documents.
@@ -672,7 +672,7 @@ def handle_input_document(cls, event_data, working_bucket, logger=None):
672672
logger.info("Loaded uncompressed document")
673673
return cls.from_dict(event_data)
674674

675-
def prepare_output(
675+
def serialize_document(
676676
self, working_bucket, step_name, logger=None, size_threshold_kb=0
677677
):
678678
"""

patterns/pattern-2/src/assessment_function/index.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ def handler(event, context):
4141

4242
# Convert document data to Document object - handle compression
4343
working_bucket = os.environ.get('WORKING_BUCKET')
44-
document = Document.handle_input_document(document_data, working_bucket, logger)
44+
document = Document.load_document(document_data, working_bucket, logger)
4545
logger.info(f"Processing assessment for document {document.id}, section {section_id}")
4646

4747
# Update document status to ASSESSING
@@ -72,7 +72,7 @@ def handler(event, context):
7272

7373
# Prepare output with automatic compression if needed
7474
result = {
75-
'document': updated_document.prepare_output(working_bucket, f"assessment_{section_id}", logger),
75+
'document': updated_document.serialize_document(working_bucket, f"assessment_{section_id}", logger),
7676
'section_id': section_id
7777
}
7878

patterns/pattern-2/src/classification_function/index.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ def handler(event, context):
3434

3535
# Extract document from the OCR result - handle both compressed and uncompressed
3636
working_bucket = os.environ.get('WORKING_BUCKET')
37-
document = Document.handle_input_document(event["OCRResult"]["document"], working_bucket, logger)
37+
document = Document.load_document(event["OCRResult"]["document"], working_bucket, logger)
3838

3939
# Update document status to CLASSIFYING
4040
document.status = Status.CLASSIFYING
@@ -103,7 +103,7 @@ def handler(event, context):
103103

104104
# Prepare output with automatic compression if needed
105105
response = {
106-
"document": document.prepare_output(working_bucket, "classification", logger)
106+
"document": document.serialize_document(working_bucket, "classification", logger)
107107
}
108108

109109
logger.info(f"Response: {json.dumps(response, default=str)}")

patterns/pattern-2/src/extraction_function/index.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ def handler(event, context):
3333
# For Map state, we get just one section from the document
3434
# Extract the document and section from the event - handle both compressed and uncompressed
3535
working_bucket = os.environ.get('WORKING_BUCKET')
36-
full_document = Document.handle_input_document(event.get("document", {}), working_bucket, logger)
36+
full_document = Document.load_document(event.get("document", {}), working_bucket, logger)
3737

3838
# Get the section ID from the Map state input
3939
section_input = event.get("section", {})
@@ -97,7 +97,7 @@ def handler(event, context):
9797
# Prepare output with automatic compression if needed
9898
response = {
9999
"section_id": section_id,
100-
"document": section_document.prepare_output(working_bucket, f"extraction_{section_id}", logger)
100+
"document": section_document.serialize_document(working_bucket, f"extraction_{section_id}", logger)
101101
}
102102

103103
logger.info(f"Response: {json.dumps(response, default=str)}")

patterns/pattern-2/src/ocr_function/index.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -125,7 +125,7 @@ def handler(event, context):
125125
# Prepare output with automatic compression if needed
126126
working_bucket = os.environ.get('WORKING_BUCKET')
127127
response = {
128-
"document": document.prepare_output(working_bucket, "ocr", logger)
128+
"document": document.serialize_document(working_bucket, "ocr", logger)
129129
}
130130

131131
logger.info(f"Response: {json.dumps(response, default=str)}")

patterns/pattern-2/src/processresults_function/index.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ def handler(event, context):
3232
# Get the base document from the original classification result - handle both compressed and uncompressed
3333
working_bucket = os.environ.get('WORKING_BUCKET')
3434
classification_document_data = event.get("ClassificationResult", {}).get("document", {})
35-
document = Document.handle_input_document(classification_document_data, working_bucket, logger)
35+
document = Document.load_document(classification_document_data, working_bucket, logger)
3636

3737
extraction_results = event.get("ExtractionResults", [])
3838

@@ -51,11 +51,11 @@ def handler(event, context):
5151
# or extraction result if assessment is disabled
5252
assessment_document_data = result.get("AssessmentResult", {}).get("document", {})
5353
if assessment_document_data:
54-
section_document = Document.handle_input_document(assessment_document_data, working_bucket, logger)
54+
section_document = Document.load_document(assessment_document_data, working_bucket, logger)
5555
else:
5656
# No assessment result, try extraction result
5757
extraction_document_data = result.get("document", {})
58-
section_document = Document.handle_input_document(extraction_document_data, working_bucket, logger)
58+
section_document = Document.load_document(extraction_document_data, working_bucket, logger)
5959
if section_document:
6060
# Add section to document if present
6161
if section_document.sections:
@@ -80,7 +80,7 @@ def handler(event, context):
8080

8181
# Return the completed document with compression
8282
response = {
83-
"document": document.prepare_output(working_bucket, "processresults", logger)
83+
"document": document.serialize_document(working_bucket, "processresults", logger)
8484
}
8585

8686
logger.info(f"Response: {json.dumps(response, default=str)}")

patterns/pattern-2/src/summarization_function/index.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ def handler(event, context):
4343

4444
# Convert data to Document object - handle compression
4545
working_bucket = os.environ.get('WORKING_BUCKET')
46-
document = Document.handle_input_document(document_data, working_bucket, logger)
46+
document = Document.load_document(document_data, working_bucket, logger)
4747

4848
# Update document status to SUMMARIZING
4949
document.status = Status.SUMMARIZING
@@ -74,7 +74,7 @@ def handler(event, context):
7474

7575
# Prepare output with automatic compression if needed
7676
return {
77-
'document': processed_document.prepare_output(working_bucket, "summarization", logger),
77+
'document': processed_document.serialize_document(working_bucket, "summarization", logger),
7878
}
7979

8080
except Exception as e:

0 commit comments

Comments
 (0)