Skip to content

Commit 4cb7aaf

Browse files
committed
Merge branch 'feature/folder-prefix-match-test' into 'develop'
Test Studio - Comprehensive Test Management Interface See merge request genaiic-reusable-assets/engagement-artifacts/genaiic-idp-accelerator!398
2 parents 59f3738 + d265594 commit 4cb7aaf

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

64 files changed

+9436
-7
lines changed

docs/test-studio.md

Lines changed: 160 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,160 @@
1+
# Test Studio
2+
3+
The Test Studio provides a comprehensive interface for managing test sets, running tests, and analyzing results directly from the web UI.
4+
5+
## Overview
6+
7+
The Test Studio consists of two main tabs:
8+
1. **Test Sets**: Create and manage reusable collections of test documents
9+
2. **Test Executions**: Execute tests, view results, and compare test runs
10+
11+
## Architecture
12+
13+
### Backend Components
14+
15+
#### TestSetResolver Lambda
16+
- **Location**: `src/lambda/test_set_resolver/index.py`
17+
- **Purpose**: Handles GraphQL operations for test set management
18+
- **Features**: Creates test sets, scans TestSetBucket for direct uploads, validates file matching, manages test set status
19+
20+
#### TestSetZipExtractor Lambda
21+
- **Location**: `src/lambda/test_set_zip_extractor/index.py`
22+
- **Purpose**: Extracts and validates uploaded zip files
23+
- **Features**: S3 event triggered extraction, file validation, status updates
24+
25+
#### TestRunner Lambda
26+
- **Location**: `src/lambda/test_runner/index.py`
27+
- **Purpose**: Initiates test runs and queues file processing jobs
28+
- **Features**: Test validation, SQS message queuing, fast response optimization
29+
30+
#### TestFileCopier Lambda
31+
- **Location**: `src/lambda/test_file_copier/index.py`
32+
- **Purpose**: Handles asynchronous file copying and processing initiation
33+
- **Features**: SQS message processing, file copying, status management
34+
35+
#### TestResultsResolver Lambda
36+
- **Location**: `src/lambda/test_results_resolver/index.py`
37+
- **Purpose**: Handles GraphQL queries for test results and comparisons, plus asynchronous cache updates
38+
- **Features**:
39+
- Result retrieval with cached metrics
40+
- Comparison logic and metrics aggregation
41+
- Dual event handling (GraphQL + SQS)
42+
- Asynchronous cache update processing
43+
- Progress-aware status updates
44+
45+
#### TestResultCacheUpdateQueue
46+
- **Type**: AWS SQS Queue
47+
- **Purpose**: Decouples heavy metric calculations from synchronous API calls
48+
- **Features**:
49+
- Encrypted message storage
50+
- 15-minute visibility timeout for long-running calculations
51+
- Automatic retry handling
52+
53+
### GraphQL Schema
54+
- **Location**: `src/api/schema.graphql`
55+
- **Operations**: `getTestSets`, `addTestSet`, `addTestSetFromUpload`, `deleteTestSets`, `getTestRuns`, `startTestRun`, `compareTestRuns`
56+
57+
### Frontend Components
58+
59+
#### TestStudioLayout
60+
- **Location**: `src/ui/src/components/test-studio/TestStudioLayout.jsx`
61+
- **Purpose**: Main container with two-tab navigation and global state management
62+
63+
#### TestSets
64+
- **Location**: `src/ui/src/components/test-studio/TestSets.jsx`
65+
- **Purpose**: Manage test set collections
66+
- **Features**: Pattern-based creation, zip upload, direct upload detection, dual polling (3s active, 30s discovery)
67+
68+
#### TestExecutions
69+
- **Location**: `src/ui/src/components/test-studio/TestExecutions.jsx`
70+
- **Purpose**: Unified interface combining TestRunner and TestResultsList
71+
- **Features**: Test execution, results viewing, comparison, export, delete operations
72+
73+
## Component Structure
74+
75+
```
76+
components/
77+
└── test-studio/
78+
├── TestStudioLayout.jsx
79+
├── TestSets.jsx
80+
├── TestExecutions.jsx
81+
├── TestRunner.jsx
82+
├── TestResultsList.jsx
83+
├── TestResults.jsx
84+
├── TestComparison.jsx
85+
├── TestRunnerStatus.jsx
86+
├── DeleteTestModal.jsx
87+
└── index.js
88+
```
89+
90+
## Test Sets
91+
92+
### Creating Test Sets
93+
1. **Pattern-based**: Define file patterns (e.g., `*.pdf`) with bucket type selection
94+
- **Input Bucket**: Scan main processing bucket for matching files
95+
- **Test Set Bucket**: Scan dedicated test set bucket for matching files
96+
2. **Zip Upload**: Upload zip containing `input/` and `baseline/` folders
97+
3. **Direct Upload**: Files uploaded directly to TestSetBucket are auto-detected
98+
99+
### File Structure Requirements
100+
```
101+
my-test-set/
102+
├── input/
103+
│ ├── document1.pdf
104+
│ └── document2.pdf
105+
└── baseline/
106+
├── document1.pdf/
107+
│ └── [ground truth files]
108+
└── document2.pdf/
109+
└── [ground truth files]
110+
```
111+
112+
### Validation Rules
113+
- Each input file must have corresponding baseline folder
114+
- Baseline folder name must match input filename exactly
115+
- Status: COMPLETED (valid), FAILED (validation errors), PROCESSING (uploading)
116+
117+
### Upload Methods
118+
1. **UI Zip Upload**: S3 event → Lambda extraction → Validation → Status update
119+
2. **Direct S3 Upload**: Detected via refresh button or automatic polling
120+
121+
## Test Executions
122+
123+
### Running Tests
124+
1. Select test set from dropdown
125+
2. Click "Run Test" (single test execution only)
126+
3. Monitor progress via TestRunnerStatus
127+
4. View results in integrated listing
128+
129+
### Test States
130+
- **QUEUED**: File copying jobs queued in SQS
131+
- **RUNNING**: Files being copied and processed
132+
- **COMPLETED**: Test finished successfully
133+
- **FAILED**: Errors during processing
134+
135+
### Results Management
136+
- Filter and paginate test runs
137+
- Multi-select for comparison
138+
- Navigate to detailed results view
139+
- Delete and export functionality
140+
141+
## Key Features
142+
143+
### Test Set Management
144+
- Reusable collections with file patterns across multiple buckets
145+
- Dual bucket support (Input Bucket and Test Set Bucket)
146+
- Zip upload with automatic extraction
147+
- Direct upload detection via dual polling
148+
- File structure validation with error reporting
149+
150+
### Test Execution
151+
- Single test concurrency prevention
152+
- Real-time status monitoring
153+
- Global state persistence across navigation
154+
- SQS-based asynchronous processing
155+
156+
### Results Analysis
157+
- Comprehensive metrics display
158+
- Side-by-side test comparison
159+
- Export capabilities
160+
- Integrated delete operations

lib/idp_common_pkg/idp_common/dynamodb/client.py

Lines changed: 91 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@
1818
class DynamoDBError(Exception):
1919
"""Custom exception for DynamoDB errors"""
2020

21-
def __init__(self, message: str, error_code: str = None):
21+
def __init__(self, message: str, error_code: str | None = None):
2222
super().__init__(message)
2323
self.error_code = error_code
2424

@@ -54,7 +54,7 @@ def __init__(self, table_name: Optional[str] = None, region: Optional[str] = Non
5454

5555
try:
5656
self.dynamodb = boto3.resource("dynamodb", region_name=self.region)
57-
self.table = self.dynamodb.Table(self.table_name)
57+
self.table = self.dynamodb.Table(self.table_name) # type: ignore[attr-defined]
5858
except Exception as e:
5959
logger.error(f"Failed to initialize DynamoDB client: {str(e)}")
6060
raise DynamoDBError(f"Failed to initialize DynamoDB client: {str(e)}")
@@ -134,6 +134,32 @@ def update_item(
134134
logger.error(f"BotoCore error during update_item: {str(e)}")
135135
raise DynamoDBError(f"BotoCore error: {str(e)}")
136136

137+
def delete_item(self, key: Dict[str, Any]) -> Dict[str, Any]:
138+
"""
139+
Delete an item from the DynamoDB table.
140+
141+
Args:
142+
key: The primary key of the item to delete
143+
144+
Returns:
145+
Dict containing the delete response
146+
147+
Raises:
148+
DynamoDBError: If the DynamoDB operation fails
149+
"""
150+
try:
151+
response = self.table.delete_item(Key=key)
152+
logger.debug(f"Successfully deleted item with key: {key}")
153+
return response
154+
except ClientError as e:
155+
error_code = e.response["Error"]["Code"]
156+
error_message = e.response["Error"]["Message"]
157+
logger.error(f"DynamoDB delete_item failed: {error_code} - {error_message}")
158+
raise DynamoDBError(f"Delete failed: {error_message}", error_code)
159+
except BotoCoreError as e:
160+
logger.error(f"BotoCore error during delete_item: {str(e)}")
161+
raise DynamoDBError(f"BotoCore error: {str(e)}")
162+
137163
def get_item(self, key: Dict[str, Any]) -> Optional[Dict[str, Any]]:
138164
"""
139165
Get an item from the DynamoDB table.
@@ -266,6 +292,68 @@ def scan(
266292
logger.error(f"BotoCore error during scan: {str(e)}")
267293
raise DynamoDBError(f"BotoCore error: {str(e)}")
268294

295+
def scan_all(
296+
self,
297+
filter_expression: Optional[str] = None,
298+
expression_attribute_names: Optional[Dict[str, str]] = None,
299+
expression_attribute_values: Optional[Dict[str, Any]] = None,
300+
) -> List[Dict[str, Any]]:
301+
"""
302+
Scan the DynamoDB table with automatic pagination to retrieve all items.
303+
304+
Args:
305+
filter_expression: Optional filter expression
306+
expression_attribute_names: Optional attribute name mappings
307+
expression_attribute_values: Optional attribute value mappings
308+
309+
Returns:
310+
List of all items matching the filter
311+
312+
Raises:
313+
DynamoDBError: If the DynamoDB operation fails
314+
"""
315+
items = []
316+
last_evaluated_key = None
317+
318+
while True:
319+
scan_params = {}
320+
321+
if filter_expression:
322+
scan_params["FilterExpression"] = filter_expression
323+
324+
if expression_attribute_names:
325+
scan_params["ExpressionAttributeNames"] = expression_attribute_names
326+
327+
if expression_attribute_values:
328+
scan_params["ExpressionAttributeValues"] = expression_attribute_values
329+
330+
if last_evaluated_key:
331+
scan_params["ExclusiveStartKey"] = last_evaluated_key
332+
333+
try:
334+
response = self.table.scan(**scan_params)
335+
items.extend(response.get("Items", []))
336+
337+
last_evaluated_key = response.get("LastEvaluatedKey")
338+
if not last_evaluated_key:
339+
break
340+
341+
except ClientError as e:
342+
error_code = e.response["Error"]["Code"]
343+
error_message = e.response["Error"]["Message"]
344+
logger.error(
345+
f"DynamoDB scan_all failed: {error_code} - {error_message}"
346+
)
347+
raise DynamoDBError(f"Scan failed: {error_message}", error_code)
348+
except BotoCoreError as e:
349+
logger.error(f"BotoCore error during scan_all: {str(e)}")
350+
raise DynamoDBError(f"BotoCore error: {str(e)}")
351+
352+
logger.debug(
353+
f"Successfully scanned all items, returned {len(items)} total items"
354+
)
355+
return items
356+
269357
def query(
270358
self,
271359
key_condition_expression: str,
@@ -291,7 +379,7 @@ def query(
291379
DynamoDBError: If the DynamoDB operation fails
292380
"""
293381
try:
294-
query_params = {
382+
query_params: Dict[str, Any] = {
295383
"KeyConditionExpression": key_condition_expression,
296384
}
297385

lib/idp_common_pkg/idp_common/s3/__init__.py

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -242,3 +242,40 @@ def _list_local_images(directory_path: str, image_extensions: set) -> List[str]:
242242
except Exception as e:
243243
logger.error(f"Error listing images from local directory {directory_path}: {e}")
244244
raise
245+
246+
def find_matching_files(bucket: str, pattern: str) -> List[str]:
247+
"""
248+
Find files in S3 bucket that match a given pattern.
249+
250+
Args:
251+
bucket: S3 bucket name
252+
pattern: File pattern with wildcards (* and ?) - case sensitive, * doesn't match /
253+
254+
Returns:
255+
List of matching file keys
256+
"""
257+
import re
258+
259+
try:
260+
s3 = get_s3_client()
261+
paginator = s3.get_paginator('list_objects_v2')
262+
263+
# Convert pattern: * matches anything except /, ? matches single char except /
264+
regex_pattern = pattern.replace('*', '[^/]*').replace('?', '[^/]')
265+
regex = re.compile(f'^{regex_pattern}$')
266+
267+
matching_files = []
268+
269+
for page in paginator.paginate(Bucket=bucket):
270+
if 'Contents' in page:
271+
for obj in page['Contents']:
272+
key = obj['Key']
273+
if regex.match(key):
274+
matching_files.append(key)
275+
276+
logger.info(f"Found {len(matching_files)} files matching pattern '{pattern}' in bucket '{bucket}'")
277+
return sorted(matching_files)
278+
279+
except Exception as e:
280+
logger.error(f"Error finding matching files in bucket {bucket} with pattern {pattern}: {e}")
281+
raise
Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
2+
# SPDX-License-Identifier: MIT-0
3+
4+
import pytest
5+
6+
7+
@pytest.mark.unit
8+
def test_list_to_dict_conversion():
9+
"""Test conversion of list to dict for comparison functions"""
10+
# Simulate the data structure conversion used in comparison functions
11+
results_list = [
12+
{"testRunId": "test1", "overallAccuracy": 85, "totalCost": 1.50},
13+
{"testRunId": "test2", "overallAccuracy": 90, "totalCost": 2.00},
14+
]
15+
16+
# Convert list to dict with testRunId as key
17+
results_dict = {result["testRunId"]: result for result in results_list}
18+
19+
assert len(results_dict) == 2
20+
assert "test1" in results_dict
21+
assert "test2" in results_dict
22+
assert results_dict["test1"]["overallAccuracy"] == 85
23+
assert results_dict["test2"]["totalCost"] == 2.00
24+
25+
26+
@pytest.mark.unit
27+
def test_metrics_comparison_structure():
28+
"""Test metrics comparison data structure"""
29+
results_dict = {
30+
"test1": {"overallAccuracy": 85, "averageConfidence": 75, "totalCost": 1.50},
31+
"test2": {"overallAccuracy": 90, "averageConfidence": 80, "totalCost": 2.00},
32+
}
33+
34+
# Simulate _build_metrics_comparison logic
35+
metrics = [
36+
{
37+
"metric": "Overall Accuracy",
38+
"values": {
39+
k: f"{v.get('overallAccuracy', 0)}%" for k, v in results_dict.items()
40+
},
41+
},
42+
{
43+
"metric": "Average Confidence",
44+
"values": {
45+
k: f"{v.get('averageConfidence', 0)}%" for k, v in results_dict.items()
46+
},
47+
},
48+
{
49+
"metric": "Total Cost",
50+
"values": {k: f"${v.get('totalCost', 0)}" for k, v in results_dict.items()},
51+
},
52+
]
53+
54+
assert len(metrics) == 3
55+
assert metrics[0]["metric"] == "Overall Accuracy"
56+
assert metrics[0]["values"]["test1"] == "85%"
57+
assert metrics[2]["values"]["test2"] == "$2.0"

0 commit comments

Comments
 (0)