You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+98-2Lines changed: 98 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,14 +5,110 @@ SPDX-License-Identifier: MIT-0
5
5
6
6
## [Unreleased]
7
7
8
+
## [0.3.6]
9
+
10
+
### Fixed
11
+
- Update Athena/Glue table configuration to use Parquet format instead of JSON #20
12
+
- Cloudformation Error when Changing Evaluation Bucket Name #19
13
+
14
+
### Added
15
+
-**Extended Document Format Support in OCR Service**
16
+
- Added support for processing additional document formats beyond PDF and images:
17
+
- Plain text (.txt) files with automatic pagination for large documents
18
+
- CSV (.csv) files with table visualization and structured output
19
+
- Excel workbooks (.xlsx, .xls) with multi-sheet support (each sheet as a page)
20
+
- Word documents (.docx, .doc) with text extraction and visual representation
21
+
-**Key Features**:
22
+
- Consistent processing model across all document formats
23
+
- Standard page image generation for all formats
24
+
- Structured text output in formats compatible with existing extraction pipelines
25
+
- Confidence metrics for all document types
26
+
- Automatic format detection from file content and extension
27
+
-**Implementation Details**:
28
+
- Format-specific processing strategies for optimal results
29
+
- Enhanced text rendering for plain text documents
30
+
- Table visualization for CSV and Excel data
31
+
- Word document paragraph extraction with formatting preservation
32
+
- S3 storage integration matching existing PDF processing workflow
33
+
34
+
## [0.3.5]
35
+
36
+
### Added
37
+
-**Human-in-the-Loop (HITL) Support - Pattern 1**
38
+
- Added comprehensive Human-in-the-Loop review capabilities using Amazon SageMaker Augmented AI (A2I)
39
+
-**Key Features**:
40
+
- Automatic triggering when extraction confidence falls below configurable threshold
41
+
- Integration with SageMaker A2I Review Portal for human validation and correction
42
+
- Configurable confidence threshold through Web UI Portal Configuration tab (0.0-1.0 range)
43
+
- Seamless result integration with human-verified data automatically updating source results
44
+
-**Workflow Integration**:
45
+
- HITL tasks created automatically when confidence thresholds are not met
46
+
- Reviewers can validate correct extractions or make necessary corrections through the Review Portal
47
+
- Document processing continues with human-verified data after review completion
48
+
-**Configuration Management**:
49
+
-`EnableHITL` parameter for feature toggle
50
+
- Confidence threshold configurable via Web UI without stack redeployment
51
+
- Support for existing private workforce work teams via input parameter
52
+
-**CloudFormation Output**: Added `SageMakerA2IReviewPortalURL` for easy access to review portal
53
+
-**Known Limitations**: Current A2I version cannot provide direct hyperlinks to specific document tasks; template updates require resource recreation
54
+
-**Document Compression for Large Documents - all patterns**
55
+
- Added automatic compression support to handle large documents and avoid exceeding Step Functions payload limits (256KB)
56
+
-**Key Features**:
57
+
- Automatic compression (default trigger threshold of 0KB enables compression by default)
58
+
- Transparent handling of both compressed and uncompressed documents in Lambda functions
59
+
- Temporary S3 storage for compressed document state with automatic cleanup via lifecycle policies
60
+
-**New Utility Methods**:
61
+
-`Document.load_document()`: Automatically detects and decompresses document input from Lambda events
62
+
-`Document.serialize_document()`: Automatically compresses large documents for Lambda responses
63
+
-`Document.compress()` and `Document.decompress()`: Compression/decompression methods
64
+
-**Lambda Function Integration**: All relevant Lambda functions updated to use compression utilities
65
+
-**Resolves Step Functions Errors**: Eliminates "result with a size exceeding the maximum number of bytes service limit" errors for large multi-page documents
-**Example Notebook**: Added `notebooks/examples/step3_extraction_using_yaml.ipynb` demonstrating YAML-based extraction with automatic format detection and token efficiency benefits
95
+
96
+
### Fixed
97
+
-**Enhanced JSON Extraction from LLM Responses (Issue #16)**
98
+
- Modularized duplicate `_extract_json()` functions across classification, extraction, summarization, and assessment services into a common `extract_json_from_text()` utility function
99
+
- Improved multi-line JSON handling with literal newlines in string values that previously caused parsing failures
100
+
- Added robust JSON validation and multiple fallback strategies for better extraction reliability
101
+
- Enhanced string parsing with proper escape sequence handling for quotes and newlines
102
+
- Added comprehensive unit tests covering various JSON formats including multi-line scenarios
103
+
8
104
## [0.3.4]
9
105
10
106
### Added
11
107
-**Configurable Image Processing and Enhanced Resizing Logic**
12
108
-**Improved Image Resizing Algorithm**: Enhanced aspect-ratio preserving scaling that only downsizes when necessary (scale factor < 1.0) to prevent image distortion
13
109
-**Configurable Image Dimensions**: All processing services (Assessment, Classification, Extraction, OCR) now support configurable image dimensions through configuration with default 951×1268 resolution
14
110
-**Service-Specific Image Optimization**: Each service can use optimal image dimensions for performance and quality tuning
15
-
-**Enhanced OCR Service**: Added configurable DPI for PDF-to-image conversion (default: 300) and optional image resizing with dual image strategy (stores original high-DPI images while using resized images for processing)
111
+
-**Enhanced OCR Service**: Added configurable DPI for PDF-to-image conversion and optional image resizing with dual image strategy (stores original high-DPI images while using resized images for processing)
16
112
-**Runtime Configuration**: No code changes needed to adjust image processing - all configurable through service configuration
17
113
-**Backward Compatibility**: Default values maintain existing behavior with no immediate action required for existing deployments
18
114
-**Enhanced Configuration Management**
@@ -308,7 +404,7 @@ The `idp_common_pkg` introduces a unified Document model approach for consistent
308
404
-**Section**: Represents logical document sections with classification and extraction results
309
405
310
406
#### Service Classes
311
-
-**OcrService**: Processes documents with AWS Textract and updates the Document with OCR results
407
+
-**OcrService**: Processes documents with AWS Textract or Amazon Bedrock and updates the Document with OCR results
312
408
-**ClassificationService**: Classifies document pages/sections using Bedrock or SageMaker backends
313
409
-**ExtractionService**: Extracts structured information from document sections using Bedrock
Copy file name to clipboardExpand all lines: Makefile
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -46,4 +46,4 @@ commit: lint test
46
46
export COMMIT_MESSAGE="$(shell q chat --no-interactive --trust-all-tools "Understand pending local git change and changes to be committed, then infer a commit message. Return this commit message only"| tail -n 1 | sed 's/\x1b\[[0-9;]*m//g')"&&\
Copy file name to clipboardExpand all lines: config_library/pattern-2/README.md
+57-3Lines changed: 57 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,11 +10,65 @@ This directory contains configurations for Pattern 2 of the GenAI IDP Accelerato
10
10
Pattern 2 implements an intelligent document processing workflow that uses Amazon Bedrock with Nova or Claude models for both page classification/grouping and information extraction.
11
11
12
12
Key components of Pattern 2:
13
-
- OCR processing using Amazon Textract
14
-
- Document classification using Claude via Amazon Bedrock (with two available methods):
13
+
-**OCR processing** with multiple backend options (Textract, Bedrock LLM, or image-only)
14
+
-**Document classification** using Claude via Amazon Bedrock (with two available methods):
15
15
- Page-level classification: Classifies individual pages and groups them
-**Assessment Impact**: ❌ Assessment disabled - no OCR text available
39
+
-**Text Confidence Data**: Empty
40
+
-**Cost**: No OCR costs
41
+
42
+
> ⚠️ **Assessment Recommendation**: Use Textract backend (default) when assessment functionality is required. Bedrock and None backends eliminate assessment capability due to lack of confidence data.
43
+
44
+
## Text Confidence Data and Assessment Integration
45
+
46
+
Pattern 2's assessment feature relies on text confidence data generated during the OCR phase to evaluate extraction quality and provide confidence scores for each extracted attribute.
47
+
48
+
### How Text Confidence Data Enables Assessment
49
+
50
+
1.**OCR Phase**: Textract generates confidence scores for each text block during document processing
51
+
2.**Condensed Format**: OCR service creates optimized `textConfidence.json` files with 80-90% token reduction
52
+
3.**Assessment Phase**: LLM analyzes extraction results against OCR confidence data to provide accurate confidence evaluation
53
+
4.**UI Integration**: Assessment results appear in the web interface with color-coded confidence indicators
0 commit comments