Skip to content

Commit 8d57630

Browse files
author
Bob Strahan
committed
Merge branch 'develop' v0.3.15
2 parents 71c9013 + e167bb5 commit 8d57630

File tree

84 files changed

+12413
-1173
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

84 files changed

+12413
-1173
lines changed

.gitattributes

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
* text=auto eol=lf
2+
*.py text eol=lf
3+
*.sh text eol=lf
4+
*.yaml text eol=lf
5+
*.yml text eol=lf

.gitlab-ci.yml

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -56,14 +56,15 @@ integration_tests:
5656
# AWS_DEFAULT_REGION: ${AWS_DEFAULT_REGION}
5757
# IDP_ACCOUNT_ID: ${IDP_ACCOUNT_ID}
5858

59-
# Add rules to only run on develop branch
59+
# Add rules to only run on develop branch
60+
# Add rules to only run on develop branch
6061
rules:
6162
- if: $CI_COMMIT_BRANCH == "develop"
62-
when: manual # always # When idp-accelerator CICD is reconfigured
63+
when: always # always # When idp-accelerator CICD is reconfigured
6364
- if: $CI_COMMIT_BRANCH =~ /^feature\/.*/
64-
when: manual
65+
when: always
6566
- if: $CI_COMMIT_BRANCH =~ /^fix\/.*/
66-
when: manual
67+
when: always
6768
- if: $CI_COMMIT_BRANCH =~ /^hotfix\/.*/
6869
when: manual
6970
- if: $CI_COMMIT_BRANCH =~ /^release\/.*/

CHANGELOG.md

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,62 @@ SPDX-License-Identifier: MIT-0
55

66
## [Unreleased]
77

8+
## [0.3.15]
9+
10+
### Added
11+
12+
- **Intelligent Document Discovery Module for Automated Configuration Generation**
13+
- Added Discovery module that automatically analyzes document samples to identify structure, field types, and organizational patterns
14+
- **Pattern-Neutral Design**: Works across all processing patterns (1, 2, 3) with unified discovery process and pattern-specific implementations
15+
- **Dual Discovery Methods**: Discovery without ground truth (exploratory analysis) and with ground truth (optimization using labeled data)
16+
- **Automated Blueprint Creation**: Pattern 1 includes zero-touch BDA blueprint generation with intelligent change detection and version management
17+
- **Web UI Integration**: Real-time discovery job monitoring, interactive results review, and seamless configuration integration
18+
- **Advanced Features**: Multi-model support (Nova, Claude), customizable prompts, configurable parameters, ground truth processing, schema conversion, and lifecycle management
19+
- **Key Benefits**: Rapid new document type onboarding, reduced time-to-production, configuration optimization, and automated workflow bootstrapping
20+
- **Use Cases**: New document exploration, configuration improvement, rapid prototyping, and document understanding
21+
- **Documentation**: Guide in `docs/discovery.md` with architecture details, best practices, and troubleshooting
22+
23+
- **Optional Pattern-2 Regex-Based Classification for Enhanced Performance**
24+
- Added support for optional regex patterns in document class definitions for performance optimization
25+
- **Document Name Regex**: Match against document ID/name to classify all pages without LLM processing when all pages should be the same class
26+
- **Document Page Content Regex**: Match against page text content during multi-modal page-level classification for fast page classification
27+
- **Key Benefits**: Significant performance improvements and cost savings by bypassing LLM calls for pattern-matched documents, deterministic classification results for known document patterns, seamless fallback to existing LLM classification when regex patterns don't match
28+
- **Configuration**: Optional `document_name_regex` and `document_page_content_regex` fields in class definitions with automatic regex compilation and validation
29+
- **Logging**: Comprehensive info-level logging when regex patterns match for observability and debugging
30+
- **CloudFormation Integration**: Updated Pattern-2 schema to support regex configuration through the Web UI
31+
- **Demonstration**: New `step2_classification_with_regex.ipynb` notebook showcasing regex configuration and performance comparisons
32+
- **Documentation**: Enhanced classification module README and main documentation with regex usage examples and best practices
33+
34+
- **Windows WSL Development Environment Setup Guide**
35+
- Added WSL-based development environment setup guide for Windows developers in `docs/setup-development-env-WSL.md`
36+
- **Key Features**: Automated setup script (`wsl_setup.sh`) for quick installation of Git, Python, Node.js, AWS CLI, and SAM CLI
37+
- **Integrated Workflow**: Development setup combining Windows tools (VS Code, browsers) with native Linux environment
38+
- **Target Use Cases**: Windows developers needing Linux compatibility without Docker Desktop or VM overhead
39+
40+
### Fixed
41+
- **Throttling Error Detection and Retry Logic for Assessment Functions** - [GitHub Issue #45](https://github.com/aws-solutions-library-samples/accelerated-intelligent-document-processing-on-aws/issues/45)
42+
- **Assessment Function**: Enhanced throttling detection to check for throttling errors returned in `document.errors` field in addition to thrown exceptions, raising `ThrottlingException` to trigger Step Functions retry when throttling is detected
43+
- **Granular Assessment Task Caching**: Fixed caching logic to properly cache successful assessment tasks when there are ANY failed tasks (both exception-based and result-based failures), enabling efficient retry optimization by only reprocessing failed tasks while preserving successful results
44+
- **Impact**: Improved resilience for throttling scenarios, reduced redundant processing during retries, and better Step Functions retry behavior
45+
46+
- **Security Vulnerability Mitigation - Package Updates**
47+
48+
- **GovCloud Compatibility - Hardcoded Service Domain References**
49+
- Fixed hardcoded `amazonaws.com` references in CloudFormation templates that prevented GovCloud deployment
50+
- Updated all service principals and endpoints to use dynamic `${AWS::URLSuffix}` expressions for automatic region-based resolution
51+
- **Templates Updated**: `template.yaml` (main template), `patterns/pattern-3/sagemaker_classifier_endpoint.yaml`
52+
- **Services Fixed**: EventBridge, Cognito, SageMaker, ECR, CloudFront, CodeBuild, AppSync, Lambda, DynamoDB, CloudWatch Logs, Glue
53+
- Resolves GitHub Issue #50 - templates now deploy correctly in both standard AWS and GovCloud regions
54+
55+
- **Bug Fixes and Code Improvements**
56+
- Fixed HITL processing errors in both Pattern-1 (DynamoDB validation with empty strings) and Pattern-2 (string indices error in A2I output processing)
57+
- Fixed Step Function UI issues including auto-refresh button auto-disable and fetch failures for failed executions with datetime serialization errors
58+
- Cleaned up unused Step Function subscription infrastructure and removed duplicate code in Pattern-2 HITL function
59+
- Expanded UI Visual Editor bounding box size with padding for better visibility and user interaction
60+
- Fixed bug in list of models supporting cache points - previously claude 4 sonnet and opus had been excluded.
61+
- Validations added at the assessment step for checking valid json response. The validation fails after extraction/assessment is complete if json parsing issues are encountered.
62+
63+
864
## [0.3.14]
965

1066
### Added

Makefile

Lines changed: 15 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -40,26 +40,34 @@ lint-cicd:
4040
fi
4141
@echo -e "$(GREEN)All code quality checks passed!$(NC)"
4242

43-
# Check CloudFormation templates for hardcoded AWS partition ARNs
43+
# Check CloudFormation templates for hardcoded AWS partition ARNs and service principals
4444
check-arn-partitions:
45-
@echo "Checking CloudFormation templates for hardcoded ARN partitions..."
45+
@echo "Checking CloudFormation templates for hardcoded ARN partitions and service principals..."
4646
@FOUND_ISSUES=0; \
4747
for template in template.yaml patterns/*/template.yaml patterns/*/sagemaker_classifier_endpoint.yaml options/*/template.yaml; do \
4848
if [ -f "$$template" ]; then \
4949
echo "Checking $$template..."; \
50-
MATCHES=$$(grep -n "arn:aws:" "$$template" | grep -v "arn:\$${AWS::Partition}:" || true); \
51-
if [ -n "$$MATCHES" ]; then \
50+
ARN_MATCHES=$$(grep -n "arn:aws:" "$$template" | grep -v "arn:\$${AWS::Partition}:" || true); \
51+
if [ -n "$$ARN_MATCHES" ]; then \
5252
echo -e "$(RED)ERROR: Found hardcoded 'arn:aws:' references in $$template:$(NC)"; \
53-
echo "$$MATCHES" | sed 's/^/ /'; \
53+
echo "$$ARN_MATCHES" | sed 's/^/ /'; \
5454
echo -e "$(YELLOW) These should use 'arn:\$${AWS::Partition}:' instead for GovCloud compatibility$(NC)"; \
5555
FOUND_ISSUES=1; \
5656
fi; \
57+
SERVICE_MATCHES=$$(grep -n "\.amazonaws\.com" "$$template" | grep -v "\$${AWS::URLSuffix}" | grep -v "^[[:space:]]*#" | grep -v "Description:" | grep -v "Comment:" | grep -v "cognito" | grep -v "ContentSecurityPolicy" || true); \
58+
if [ -n "$$SERVICE_MATCHES" ]; then \
59+
echo -e "$(RED)ERROR: Found hardcoded service principal references in $$template:$(NC)"; \
60+
echo "$$SERVICE_MATCHES" | sed 's/^/ /'; \
61+
echo -e "$(YELLOW) These should use '\$${AWS::URLSuffix}' instead of 'amazonaws.com' for GovCloud compatibility$(NC)"; \
62+
echo -e "$(YELLOW) Example: 'lambda.amazonaws.com' should be 'lambda.\$${AWS::URLSuffix}'$(NC)"; \
63+
FOUND_ISSUES=1; \
64+
fi; \
5765
fi; \
5866
done; \
5967
if [ $$FOUND_ISSUES -eq 0 ]; then \
60-
echo -e "$(GREEN)✅ No hardcoded ARN partition references found!$(NC)"; \
68+
echo -e "$(GREEN)✅ No hardcoded ARN partition or service principal references found!$(NC)"; \
6169
else \
62-
echo -e "$(RED)❌ Found hardcoded ARN partition references that need to be fixed$(NC)"; \
70+
echo -e "$(RED)❌ Found hardcoded references that need to be fixed for GovCloud compatibility$(NC)"; \
6371
exit 1; \
6472
fi
6573

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -128,6 +128,7 @@ For detailed deployment and testing instructions, see the [Deployment Guide](./d
128128
- [Agent Analysis](./docs/agent-analysis.md) - Natural language analytics and data visualization feature
129129
- [Custom MCP Agent](./docs/custom-MCP-agent.md) - Integrating external MCP servers for custom tools and capabilities
130130
- [Configuration](./docs/configuration.md) - Configuration and customization options
131+
- [Discovery](./docs/discovery.md) - Pattern-neutral discovery process and BDA blueprint automation
131132
- [Classification](./docs/classification.md) - Customizing document classification
132133
- [Extraction](./docs/extraction.md) - Customizing information extraction
133134
- [Human-in-the-Loop Review](./docs/human-review.md) - Human review workflows with Amazon A2I

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
0.3.14
1+
0.3.15

docs/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ This folder contains detailed documentation on various aspects of the GenAI Inte
1212
- [Web UI](./web-ui.md) - Web interface features and usage
1313
- [Agent Analysis](./agent-analysis.md) - Natural language analytics and data visualization feature
1414
- [Knowledge Base](./knowledge-base.md) - Document knowledge base query feature
15+
- [Post-Processing Lambda Hook](./post-processing-lambda-hook.md) - Custom downstream processing integration
1516
- [Evaluation Framework](./evaluation.md) - Accuracy assessment system
1617
- [Assessment Feature](./assessment.md) - Extraction confidence evaluation using LLMs
1718
- [Configuration](./configuration.md) - Configuration and customization options

docs/architecture.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -277,6 +277,8 @@ The solution supports an optional post-processing Lambda hook integration:
277277
- Custom notification systems
278278
- Receives the document processing details and output location
279279

280+
For comprehensive implementation guidance, use cases, and code examples, see [post-processing-lambda-hook.md](./post-processing-lambda-hook.md).
281+
280282
## Additional Documentation
281283

282284
- [classification.md](./classification.md) - Details on document classification capabilities

0 commit comments

Comments
 (0)