aws-solutions-library-samples
diff --git a/‎CHANGELOG.md‎
Lines changed: 42 additions & 5 deletions b/‎CHANGELOG.md‎
Lines changed: 42 additions & 5 deletions
diff --git a/‎Makefile‎
Lines changed: 2 additions & 1 deletion b/‎Makefile‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎README.md‎
Lines changed: 44 additions & 13 deletions b/‎README.md‎
Lines changed: 44 additions & 13 deletions
diff --git a/‎VERSION‎
Lines changed: 1 addition & 1 deletion b/‎VERSION‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎config_library/pattern-2/lending-package-sample/config.yaml‎
Lines changed: 32 additions & 20 deletions b/‎config_library/pattern-2/lending-package-sample/config.yaml‎
Lines changed: 32 additions & 20 deletions
@@ -9,17 +9,54 @@ SPDX-License-Identifier: MIT-0
 
 ### Added
 
-- **Agentic extraction preview with Strands agents** delivering structured field validation, configurable review flows, and sample notebooks/assets for Pattern-2 lending documents.
-  - Agentic extraction utilises Strands Agent framework to produce structured outputs in an iterative and self reviewing agent loop. It utilises tools for interacting with the output which will be extended in the future.
-  - Through the library this already allows users to utilise the `structured_output` function from the `lib/idp_common_pkg/idp_common/extraction/agentic_idp.py` to request extractions using Pydantic Models with custom validators defined. We intend to enable deeper validation customizations through the UI as well in the future.
-- **Containerized Pattern-2 deployment pipeline** that builds and pushes all Lambda images via CodeBuild using the new Dockerfile, plus automated ECR cleanup and tests.
+- **Agentic extraction preview with Strands agents (experimental)** introducing intelligent, self-correcting document extraction with improved schema compliance and accuracy improvements over traditional methods.
+  - Leverages the Strands Agent framework with iterative validation loops and automatic error correction to deliver schema compliance
+  - Provides structured output through Pydantic models with built-in validators, automatic retry handling, and superior handling of complex nested structures and date standardization
+  - Includes sample notebooks and configuration assets demonstrating agentic extraction for Pattern-2 lending documents
+  - Programmatic access available via `structured_output` function in `lib/idp_common_pkg/idp_common/extraction/agentic_idp.py`
+  - Currently this is an experimental feature. Future extensibility includes UI-based validation customization, code generation, and Model Context Protocol (MCP) integration for external data enrichment during extraction
+
+- **IDP CLI - Command Line Interface for Batch Document Processing**
+  - Added CLI tool (`idp_cli/`) for programmatic batch document processing and stack management
+  - **Key Features**: Deploy/update/delete CloudFormation stacks, process and reprocess documents from local directories or S3 URIs, live progress monitoring with rich terminal UI, download processing results locally, validate manifests before processing, generate manifests from directories with automatic baseline matching
+  - **Selective Reprocessing**: New `rerun-inference` command to reprocess documents from specific pipeline steps (classification or extraction) while leveraging existing OCR data for cost/time optimization
+  - **Evaluation Framework**: Workflow for accuracy testing including initial processing, manual validation, baseline creation, and automated evaluation with detailed metrics
+  - **Analytics Integration**: Query aggregated results via Athena SQL or use Agent Analytics in Web UI for visual analysis
+  - **Use Cases**: Rapid configuration iteration, large-scale batch processing, CI/CD integration, automated accuracy testing, automated environment cleanup, prompt engineering experiments
+  - **Documentation**: README with Quick Start, Commands Reference, Evaluation Workflow, and troubleshooting guides
+
+- **Extraction Results Integration in Summarization Service**
+  - Integrates extraction results from the extraction service into summarization module for context-aware summaries
+  - **Features**: Fully backward compatible (works with or without extraction results), automatic section handling, error resilient with graceful continuation, comprehensive logging
+  - **Configuration**: Enable by adding `{EXTRACTION_RESULTS}` placeholder to `task_prompt` in config.yaml
+  - **Benefits**: Context-aware summaries referencing extracted values, improved accuracy and quality, better extraction-summary alignment
 
 ### Changed
 
-- Updated Pattern-2 templates, `publish.py`, and documentation to detect container builds, surface new prerequisites (Docker/ECR), and detail enabling agentic extraction.
+- **Containerized Pattern-2 deployment pipeline** that builds and pushes all Lambda images via CodeBuild using the new Dockerfile, plus automated ECR cleanup and tests.
   - Lambda docker image deployments have a 10 GB image size limit compared to the 250 MB zip limit of regular deployment. This however doesn't allow for viewing the code in the AWS console.
     The change was introduced to accommodate the increased package size of introducing Strands into the package dependencies.
 
+### Fixed
+- **Discovery function times out when processing large documents.**
+  - increase lambda discovery processor timeout to 900s
+- **Corrected baseline directory structure documentation in evaluation.md**
+  - Fixed incorrect baseline structure showing flat `.json` files instead of proper directory hierarchy
+  - Updated to correct structure: `<document-name>/sections/1/result.json`
+  - Reorganized document for better logical flow and user experience
+- **GovCloud Template Generation - Removed GraphQLApi References** - #82
+  - Fixed invalid GovCloud template generation where ProcessChanges AppSync resources were not being removed, causing "Fn::GetAtt references undefined resource GraphQLApi" errors
+  - Updated `scripts/generate_govcloud_template.py` to remove all ProcessChanges-related resources and extend AppSync parameter cleanup to all pattern stacks
+  - Fixed InvalidClientTokenId validation error by ensuring CloudFormation client uses the correct region when validating templates (commercial vs GovCloud)
+- **Enhanced Processing Flow Visualization for Disabled Steps**
+  - Fixed UX issue where disabled processing steps (when `summarization.enabled: false` or `assessment.enabled: false` in configuration) appeared visually identical to active steps in the "View Processing Flow" display
+  - **Key Benefit**: Users can now immediately see which steps are actually processing data vs. steps that execute but skip processing based on configuration settings, preventing confusion about whether summarization or assessment ran
+  - Limitation: the new visual indicators are driven from the current config, which may have been altered since the document was processed. We will address this in a later release. See Issue #86.
+
+### Known Issues
+- **GovCloud Deployments fail, due to lack of ARM support for CodeBuild. Fix targeted for next release.**
+
+
 ## [0.3.19]
 
 ### Added
 
@@ -9,9 +9,10 @@ NC := \033[0m  # No Color
 # Default target - run both lint and test
 all: lint test
 
-# Run tests in idp_common_pkg directory
+# Run tests in idp_common_pkg and idp_cli directories
 test:
 	$(MAKE) -C lib/idp_common_pkg test
+	cd idp_cli && python -m pytest -v
 
 # Run both linting and formatting in one command
 lint: ruff-lint format check-arn-partitions
 
@@ -3,6 +3,8 @@
 Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
 SPDX-License-Identifier: MIT-0
 
+**Questions?** [![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/aws-solutions-library-samples/accelerated-intelligent-document-processing-on-aws)
+
 ## Table of Contents
 
 - [Introduction](#introduction)
@@ -31,6 +33,7 @@ White-glove customization, deployment, and integration support for production us
 
 - **Serverless Architecture**: Built entirely on AWS serverless technologies including Lambda, Step Functions, SQS, and DynamoDB
 - **Modular, pluggable patterns**: Pre-built processing patterns using state-of-the-art models and AWS services
+- **Command Line Interface**: Programmatic batch processing with evaluation framework and analytics integration
 - **Advanced Classification**: Support for page-level and holistic document packet classification
 - **Few Shot Example Support**: Improve accuracy through example-based prompting
 - **Custom Business Logic Integration**: Inject custom prompt generation logic via Lambda functions for specialized document processing
@@ -73,23 +76,50 @@ To quickly deploy the GenAI-IDP solution in your AWS account:
 
 ### Processing Your First Document
 
-After deployment, you can quickly process a document and view results:
+After deployment, choose the processing method that fits your use case:
 
-1. **Upload a Document**:
-   - **Via Web UI**: Open the Web UI URL from the CloudFormation stack's Outputs tab, log in, and click "Upload Document"
-   - **Via S3**: Upload directly to the S3 input bucket (find the bucket URL in CloudFormation stack Outputs)
+#### Method 1: Web UI (Interactive)
 
-2. **Use Sample Documents**:
-   - For Patterns 1 (BDA) and Pattern 2: Use [samples/lending_package.pdf](./samples/lending_package.pdf)
-   - For Pattern 3 (UDOP): Use [samples/rvl_cdip_package.pdf](./samples/rvl_cdip_package.pdf)
+1. Open the Web UI URL from CloudFormation stack Outputs
+2. Log in and click "Upload Document"
+3. Upload a sample document:
+   - For Patterns 1 & 2: [samples/lending_package.pdf](./samples/lending_package.pdf)
+   - For Pattern 3: [samples/rvl_cdip_package.pdf](./samples/rvl_cdip_package.pdf)
+4. Monitor processing and view results in the dashboard
+
+#### Method 2: Direct S3 Upload (Simple)
+
+1. Upload to the InputBucket (URL in CloudFormation Outputs)
+2. Monitor via Step Functions console
+3. Results appear in OutputBucket automatically
+
+#### Method 3: IDP CLI (Batch/Programmatic)
+
+For batch processing, automation, or evaluation workflows:
+
+```bash
+# Install CLI
+cd idp_cli && pip install -e .
+
+# Process documents
+idp-cli run-inference \
+    --stack-name <your-stack-name> \
+    --dir ./samples/ \
+    --monitor
 
-3. **Monitor Processing**:
-   - **Via Web UI**: Track document status on the dashboard
-   - **Via Step Functions**: Open the StateMachine URL from CloudFormation stack Outputs to observe workflow execution
+# Download results
+idp-cli download-results \
+    --stack-name <your-stack-name> \
+    --batch-id <batch-id> \
+    --output-dir ./results/
+```
 
-4. **View Results**:
-   - **Via Web UI**: Access processing results through the document details page
-   - **Via S3**: Check the output bucket for structured JSON files with extracted data
+**See [IDP CLI Documentation](./idp_cli/README.md)** for:
+- CLI-based stack deployment and updates
+- Batch document processing
+- Complete evaluation workflows with baselines
+- Athena and Agent Analytics integration
+- CI/CD integration examples
 
 See the [Deployment Guide](./docs/deployment.md#testing-the-solution) for more detailed testing instructions.
 
@@ -124,6 +154,7 @@ For detailed deployment and testing instructions, see the [Deployment Guide](./d
 
 - [Architecture](./docs/architecture.md) - Detailed component architecture and data flow
 - [Deployment](./docs/deployment.md) - Build, publish, deploy, and test instructions
+- [IDP CLI](./idp_cli/README.md) - Command line interface for batch processing and evaluation workflows
 - [Web UI](./docs/web-ui.md) - Web interface features and usage
 - [Agent Analysis](./docs/agent-analysis.md) - Natural language analytics and data visualization feature
 - [Custom MCP Agent](./docs/custom-MCP-agent.md) - Integrating external MCP servers for custom tools and capabilities
 
@@ -1 +1 @@
-0.3.19
+0.3.20
@@ -1082,12 +1082,14 @@ summarization:
   top_k: "5"
   task_prompt: >-
     <document-text>
-
     {DOCUMENT_TEXT}
-
     </document-text>
 
-    Analyze the provided document (<document-text>) and create a comprehensive summary.
+    <extracted-attributes>
+    {EXTRACTION_RESULTS}
+    </extracted-attributes>
+
+    Analyze the provided document (<document-text>) along with the extracted attributes (<extracted-attributes>) to create a comprehensive and accurate summary.
 
     CRITICAL INSTRUCTION: You MUST return your response as valid JSON with the
     EXACT structure shown at the end of these instructions. Do not include any
@@ -1096,29 +1098,39 @@ summarization:
     Create a summary that captures the essential information from the document.
     Your summary should:
 
-    1. Extract key information, main points, and important details
+    1. **Integrate Extracted Attributes**: Begin with a "Key Information" section that highlights the most important extracted attributes in a structured format (use tables or lists as appropriate)
+
+    2. **Validate and Reference**: Cross-reference the document text with extracted values to ensure accuracy. When mentioning specific values, prefer the extracted attributes when they are available
+
+    3. **Maintain Document Structure**: Preserve the original document's organizational structure where appropriate, using the extracted attributes to enhance each section
 
-    2. Maintain the original document's organizational structure where
-    appropriate
+    4. **Highlight Critical Data**: Emphasize important extracted values such as:
+       - Names, addresses, and identification numbers
+       - Dates and time periods
+       - Monetary amounts and financial figures
+       - Status indicators and classifications
+       - Any calculated or derived values
 
-    3. Preserve important facts, figures, dates, and entities
+    5. **Use Markdown Formatting**: Apply markdown for better readability:
+       - Use headers (##, ###) for sections
+       - Create tables for structured data from extracted attributes
+       - Use **bold** for important values and *italics* for emphasis
+       - Create lists (bullet or numbered) for multiple items
 
-    4. Reduce the length while retaining all critical information
+    6. **Provide Context**: For each extracted value mentioned, provide brief context from the document text to explain its significance
 
-    5. Use markdown formatting for better readability (headings, lists,
-    emphasis, etc.)
+    7. **Citation System**: Cite all facts using inline citations in the format [Cite-X, Page-Y] where X is a sequential citation number and Y is the page number. Format citations as markdown links: [[Cite-1, Page-3]](#cite-1-page-3)
 
-    6. Cite all relevant facts from the source document using inline citations
-    in the format [Cite-X, Page-Y] where X is a sequential citation number and Y
-    is the page number
+    8. **Data Completeness**: If extracted attributes are missing or empty, note this in the summary and rely more heavily on the document text
 
-    7. Format citations as markdown links that reference the full citation list
-    at the bottom of the summary
-      Example: [[Cite-1, Page-3]](#cite-1-page-3)
+    9. **References Section**: At the end, include a "References" section listing all citations with their exact text from the source document
 
-    8. At the end of the summary, include a "References" section that lists all
-    citations with their exact text from the source document in the format:
-      [Cite-X, Page-Y]: Exact text from the document
+    Structure your summary as follows:
+    - **Key Information** (from extracted attributes)
+    - **Document Overview**
+    - **Detailed Sections** (based on document structure)
+    - **Summary and Conclusions**
+    - **References**
 
     Output Format:
 
@@ -1127,7 +1139,7 @@ summarization:
 
     ```json
     {
-      "summary": "A comprehensive summary in markdown format with inline citations linked to a references section at the bottom"
+      "summary": "A comprehensive summary in markdown format that integrates extracted attributes with document text, includes inline citations linked to a references section at the bottom"
     }
     ```