Skip to content

Commit 97d8001

Browse files
committed
Merge branch 'feature/evaluation-inside-workflow' into 'develop'
Feature/evaluation inside workflow See merge request genaiic-reusable-assets/engagement-artifacts/genaiic-idp-accelerator!378
2 parents f092098 + eab0087 commit 97d8001

File tree

24 files changed

+256
-123
lines changed

24 files changed

+256
-123
lines changed

CHANGELOG.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,15 @@ SPDX-License-Identifier: MIT-0
2929

3030
### Changed
3131

32+
- **Migrated Evaluation from EventBridge Trigger to Step Functions Workflow**
33+
- Moved evaluation processing from external EventBridge-triggered Lambda to integrated Step Functions workflow step
34+
- **Race Condition Eliminated**: Evaluation now runs inside state machine before WorkflowTracker marks documents COMPLETE, preventing premature completion status when evaluation is still running
35+
- **Config-Driven Control**: Evaluation now controlled by `evaluation.enabled` configuration setting instead of CloudFormation stack parameter, enabling runtime control without stack redeployment
36+
- **Enhanced Status Tracking**: Added EVALUATING status to document processing pipeline for better visibility of evaluation progress
37+
- **UI Improvements**: Added support for displaying EVALUATING status in processing flow viewer and "NOT ENABLED" badge when evaluation is disabled in configuration
38+
- **Consistent Pattern**: Aligns evaluation with summarization and assessment patterns for unified feature control approach
39+
40+
3241
- **Migrated UI Build System from Create React App to Vite**
3342
- Upgraded to Vite 7 for faster build times
3443
- Updated to React 18, AWS Amplify v6, react-router-dom v6, and Cloudscape Design System

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
0.3.21-rc1
1+
0.3.21-rc2

config_library/pattern-1/lending-package-sample/config.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,7 @@ summarization:
6767
system_prompt: >-
6868
You are a document summarization expert who can analyze and summarize documents from various domains including medical, financial, legal, and general business documents. Your task is to create a summary that captures the key information, main points, and important details from the document. Your output must be in valid JSON format. \nSummarization Style: Balanced\\nCreate a balanced summary that provides a moderate level of detail. Include the main points and key supporting information, while maintaining the document's overall structure. Aim for a comprehensive yet concise summary.\n Your output MUST be in valid JSON format with markdown content. You MUST strictly adhere to the output format specified in the instructions.
6969
evaluation:
70+
enabled: true
7071
llm_method:
7172
top_p: '0.1'
7273
max_tokens: '4096'
@@ -520,4 +521,3 @@ pricing:
520521
units:
521522
- name: gb_seconds
522523
price: '1.66667E-5' # $0.0000166667 per GB-second ($16.67 per 1M GB-seconds)
523-

config_library/pattern-2/bank-statement-sample/config.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -529,6 +529,7 @@ assessment:
529529
</extraction-results>
530530
531531
evaluation:
532+
enabled: true
532533
llm_method:
533534
top_p: '0.1'
534535
max_tokens: '4096'
@@ -997,4 +998,3 @@ pricing:
997998
units:
998999
- name: gb_seconds
9991000
price: '1.66667E-5' # $0.0000166667 per GB-second ($16.67 per 1M GB-seconds)
1000-

config_library/pattern-2/lending-package-sample/config.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1307,6 +1307,7 @@ assessment:
13071307
{EXTRACTION_RESULTS}
13081308
</extraction-results>
13091309
evaluation:
1310+
enabled: true
13101311
llm_method:
13111312
top_p: "0.1"
13121313
max_tokens: "4096"
@@ -1776,4 +1777,3 @@ pricing:
17761777
units:
17771778
- name: gb_seconds
17781779
price: "1.66667E-5" # $0.0000166667 per GB-second ($16.67 per 1M GB-seconds)
1779-

config_library/pattern-2/rvl-cdip-package-sample-with-few-shot-examples/config.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -957,6 +957,7 @@ assessment:
957957
{EXTRACTION_RESULTS}
958958
</extraction-results>
959959
evaluation:
960+
enabled: true
960961
llm_method:
961962
top_p: '0.1'
962963
max_tokens: '4096'
@@ -1509,4 +1510,4 @@ pricing:
15091510
- name: lambda/duration
15101511
units:
15111512
- name: gb_seconds
1512-
price: '1.66667E-5' # $0.0000166667 per GB-second ($16.67 per 1M GB-seconds)
1513+
price: '1.66667E-5' # $0.0000166667 per GB-second ($16.67 per 1M GB-seconds)

config_library/pattern-2/rvl-cdip-package-sample/config.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -766,6 +766,7 @@ assessment:
766766
{EXTRACTION_RESULTS}
767767
</extraction-results>
768768
evaluation:
769+
enabled: true
769770
llm_method:
770771
top_p: '0.1'
771772
max_tokens: '4096'
@@ -1235,4 +1236,3 @@ pricing:
12351236
units:
12361237
- name: gb_seconds
12371238
price: '1.66667E-5' # $0.0000166667 per GB-second ($16.67 per 1M GB-seconds)
1238-

config_library/pattern-3/rvl-cdip-package-sample/config.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -625,6 +625,7 @@ assessment:
625625
{EXTRACTION_RESULTS}
626626
</extraction-results>
627627
evaluation:
628+
enabled: true
628629
llm_method:
629630
top_p: '0.1'
630631
max_tokens: '4096'

docs/configuration.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -155,7 +155,6 @@ Key parameters that can be configured during CloudFormation deployment:
155155

156156
### Optional Features
157157
- `EvaluationBaselineBucketName`: Optional existing bucket for ground truth data
158-
- `EvaluationAutoEnabled`: Enable automatic accuracy evaluation (default: true)
159158
- `DocumentKnowledgeBase`: Enable document knowledge base functionality
160159
- `KnowledgeBaseModelId`: Bedrock model for knowledge base queries
161160
- `PostProcessingLambdaHookFunctionArn`: Optional Lambda ARN for custom post-processing (see [post-processing-lambda-hook.md](post-processing-lambda-hook.md) for detailed implementation guidance)

docs/evaluation.md

Lines changed: 31 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -16,9 +16,11 @@ The GenAIIDP solution includes a built-in evaluation framework to assess the acc
1616
- Use an existing bucket or let the solution create one
1717
- Can use outputs from another GenAIIDP stack to compare different patterns/prompts
1818

19-
2. **Automatic Evaluation**
20-
- When enabled, automatically evaluates each processed document
21-
- Compares against baseline data if available
19+
2. **Integrated Evaluation Step**
20+
- Evaluation runs as the final step in the Step Functions workflow (after summarization)
21+
- Executes **before** the workflow marks documents as COMPLETE, eliminating race conditions
22+
- When `evaluation.enabled: true` in configuration, evaluates against baseline data if available
23+
- When `evaluation.enabled: false` in configuration, step executes but skips processing
2224
- Generates detailed markdown reports using AI analysis
2325

2426
3. **Evaluation Reports**
@@ -79,21 +81,38 @@ The confidence integration is fully backward compatible:
7981

8082
## Configuration
8183

82-
Set the following parameters during stack deployment:
84+
### Stack Deployment Parameters
85+
86+
Set the following parameter during stack deployment:
8387

8488
```yaml
8589
EvaluationBaselineBucketName:
8690
Description: Existing bucket with baseline data, or leave empty to create new bucket
87-
88-
EvaluationAutoEnabled:
89-
Default: true
90-
Description: Automatically evaluate each document (if baseline exists)
91-
92-
EvaluationModelId:
93-
Default: "anthropic.claude-3-sonnet-20240229-v1:0"
94-
Description: Model to use for evaluation reports (e.g., "us.anthropic.claude-3-7-sonnet-20250219-v1:0")
9591
```
9692
93+
### Runtime Configuration
94+
95+
Control evaluation behavior through the configuration file (no stack redeployment needed):
96+
97+
```yaml
98+
evaluation:
99+
enabled: true # Set to false to disable evaluation processing
100+
llm_method:
101+
model: "us.anthropic.claude-3-haiku-20240307-v1:0" # Model for evaluation reports
102+
temperature: "0.0"
103+
top_p: "0.1"
104+
max_tokens: "4096"
105+
# Additional model parameters...
106+
```
107+
108+
**Benefits of Configuration-Based Control:**
109+
- Enable/disable evaluation without stack redeployment
110+
- Runtime control similar to summarization and assessment features
111+
- Zero LLM costs when disabled (step executes but skips processing)
112+
- Consistent feature control pattern across the solution
113+
114+
### Attribute-Specific Evaluation Methods
115+
97116
You can also configure evaluation methods for specific document classes and attributes through the solution's configuration. The framework supports three types of attributes with different evaluation approaches:
98117

99118
### Simple Attributes

0 commit comments

Comments
 (0)