Skip to content

Commit 6fb00c8

Browse files
author
Bob Strahan
committed
docs: Update evaluation framework to use configuration-based control
1 parent 375913e commit 6fb00c8

File tree

4 files changed

+35
-33
lines changed

4 files changed

+35
-33
lines changed

docs/configuration.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -155,7 +155,6 @@ Key parameters that can be configured during CloudFormation deployment:
155155

156156
### Optional Features
157157
- `EvaluationBaselineBucketName`: Optional existing bucket for ground truth data
158-
- `EvaluationAutoEnabled`: Enable automatic accuracy evaluation (default: true)
159158
- `DocumentKnowledgeBase`: Enable document knowledge base functionality
160159
- `KnowledgeBaseModelId`: Bedrock model for knowledge base queries
161160
- `PostProcessingLambdaHookFunctionArn`: Optional Lambda ARN for custom post-processing (see [post-processing-lambda-hook.md](post-processing-lambda-hook.md) for detailed implementation guidance)

docs/evaluation.md

Lines changed: 31 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -16,9 +16,11 @@ The GenAIIDP solution includes a built-in evaluation framework to assess the acc
1616
- Use an existing bucket or let the solution create one
1717
- Can use outputs from another GenAIIDP stack to compare different patterns/prompts
1818

19-
2. **Automatic Evaluation**
20-
- When enabled, automatically evaluates each processed document
21-
- Compares against baseline data if available
19+
2. **Integrated Evaluation Step**
20+
- Evaluation runs as the final step in the Step Functions workflow (after summarization)
21+
- Executes **before** the workflow marks documents as COMPLETE, eliminating race conditions
22+
- When `evaluation.enabled: true` in configuration, evaluates against baseline data if available
23+
- When `evaluation.enabled: false` in configuration, step executes but skips processing
2224
- Generates detailed markdown reports using AI analysis
2325

2426
3. **Evaluation Reports**
@@ -79,21 +81,38 @@ The confidence integration is fully backward compatible:
7981

8082
## Configuration
8183

82-
Set the following parameters during stack deployment:
84+
### Stack Deployment Parameters
85+
86+
Set the following parameter during stack deployment:
8387

8488
```yaml
8589
EvaluationBaselineBucketName:
8690
Description: Existing bucket with baseline data, or leave empty to create new bucket
87-
88-
EvaluationAutoEnabled:
89-
Default: true
90-
Description: Automatically evaluate each document (if baseline exists)
91-
92-
EvaluationModelId:
93-
Default: "anthropic.claude-3-sonnet-20240229-v1:0"
94-
Description: Model to use for evaluation reports (e.g., "us.anthropic.claude-3-7-sonnet-20250219-v1:0")
9591
```
9692
93+
### Runtime Configuration
94+
95+
Control evaluation behavior through the configuration file (no stack redeployment needed):
96+
97+
```yaml
98+
evaluation:
99+
enabled: true # Set to false to disable evaluation processing
100+
llm_method:
101+
model: "us.anthropic.claude-3-haiku-20240307-v1:0" # Model for evaluation reports
102+
temperature: "0.0"
103+
top_p: "0.1"
104+
max_tokens: "4096"
105+
# Additional model parameters...
106+
```
107+
108+
**Benefits of Configuration-Based Control:**
109+
- Enable/disable evaluation without stack redeployment
110+
- Runtime control similar to summarization and assessment features
111+
- Zero LLM costs when disabled (step executes but skips processing)
112+
- Consistent feature control pattern across the solution
113+
114+
### Attribute-Specific Evaluation Methods
115+
97116
You can also configure evaluation methods for specific document classes and attributes through the solution's configuration. The framework supports three types of attributes with different evaluation approaches:
98117

99118
### Simple Attributes

docs/idp-cli.md

Lines changed: 2 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -709,20 +709,10 @@ idp-cli run-inference \
709709

710710
Download the evaluation results to analyze accuracy:
711711

712-
**⏱️ Important Timing Note:** Evaluation processing runs as a separate step after the main document processing completes. This takes an additional 2-3 minutes per document. If you download results immediately after the batch shows "Complete", the evaluation data may not be ready yet.
713-
714-
**Best practice:**
715-
1. Wait 5-10 minutes after batch completion before downloading evaluation results
716-
2. Check that the downloaded files include the `evaluation/` directory
717-
3. If evaluation data is missing, wait a few more minutes and download again
712+
**✓ Synchronous Evaluation:** Evaluation runs as the final step in the workflow before completion. When a document shows status "COMPLETE", all processing including evaluation is finished - results are immediately available for download.
718713

719714
```bash
720-
# Wait for evaluation to complete (check status)
721-
idp-cli status \
722-
--stack-name eval-testing \
723-
--batch-id eval-run-001
724-
725-
# Download evaluation results
715+
# Download evaluation results (no waiting needed)
726716
idp-cli download-results \
727717
--stack-name eval-testing \
728718
--batch-id eval-run-001 \

docs/idp-configuration-best-practices.md

Lines changed: 2 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1362,16 +1362,10 @@ Set the following parameters during stack deployment:
13621362
```yaml
13631363
EvaluationBaselineBucketName:
13641364
Description: Existing bucket with baseline data, or leave empty to create new bucket
1365-
1366-
EvaluationAutoEnabled:
1367-
Default: true
1368-
Description: Automatically evaluate each document (if baseline exists)
1369-
1370-
EvaluationModelId:
1371-
Default: "anthropic.claude-3-sonnet-20240229-v1:0"
1372-
Description: Model to use for evaluation reports
13731365
```
13741366

1367+
**Note:** Evaluation is now controlled via configuration file (`evaluation.enabled: true/false`) rather than stack parameters. See the [evaluation.md](./evaluation.md) documentation for details.
1368+
13751369
### Evaluation Methods Configuration
13761370

13771371
Configure evaluation methods for specific document classes and attributes:

0 commit comments

Comments
 (0)