aws-solutions-library-samples
diff --git a/‎docs/test-studio.md‎
Lines changed: 160 additions & 0 deletions b/‎docs/test-studio.md‎
Lines changed: 160 additions & 0 deletions
diff --git a/‎lib/idp_common_pkg/idp_common/dynamodb/client.py‎
Lines changed: 91 additions & 3 deletions b/‎lib/idp_common_pkg/idp_common/dynamodb/client.py‎
Lines changed: 91 additions & 3 deletions
diff --git a/‎lib/idp_common_pkg/idp_common/s3/__init__.py‎
Lines changed: 37 additions & 0 deletions b/‎lib/idp_common_pkg/idp_common/s3/__init__.py‎
Lines changed: 37 additions & 0 deletions
diff --git a/‎lib/idp_common_pkg/tests/unit/test_comparison_data_conversion.py‎
Lines changed: 57 additions & 0 deletions b/‎lib/idp_common_pkg/tests/unit/test_comparison_data_conversion.py‎
Lines changed: 57 additions & 0 deletions
@@ -0,0 +1,160 @@
+# Test Studio
+
+The Test Studio provides a comprehensive interface for managing test sets, running tests, and analyzing results directly from the web UI.
+
+## Overview
+
+The Test Studio consists of two main tabs:
+1. **Test Sets**: Create and manage reusable collections of test documents
+2. **Test Executions**: Execute tests, view results, and compare test runs
+
+## Architecture
+
+### Backend Components
+
+#### TestSetResolver Lambda
+- **Location**: `src/lambda/test_set_resolver/index.py`
+- **Purpose**: Handles GraphQL operations for test set management
+- **Features**: Creates test sets, scans TestSetBucket for direct uploads, validates file matching, manages test set status
+
+#### TestSetZipExtractor Lambda
+- **Location**: `src/lambda/test_set_zip_extractor/index.py`
+- **Purpose**: Extracts and validates uploaded zip files
+- **Features**: S3 event triggered extraction, file validation, status updates
+
+#### TestRunner Lambda
+- **Location**: `src/lambda/test_runner/index.py`
+- **Purpose**: Initiates test runs and queues file processing jobs
+- **Features**: Test validation, SQS message queuing, fast response optimization
+
+#### TestFileCopier Lambda
+- **Location**: `src/lambda/test_file_copier/index.py`
+- **Purpose**: Handles asynchronous file copying and processing initiation
+- **Features**: SQS message processing, file copying, status management
+
+#### TestResultsResolver Lambda
+- **Location**: `src/lambda/test_results_resolver/index.py`
+- **Purpose**: Handles GraphQL queries for test results and comparisons, plus asynchronous cache updates
+- **Features**: 
+  - Result retrieval with cached metrics
+  - Comparison logic and metrics aggregation
+  - Dual event handling (GraphQL + SQS)
+  - Asynchronous cache update processing
+  - Progress-aware status updates
+
+#### TestResultCacheUpdateQueue
+- **Type**: AWS SQS Queue
+- **Purpose**: Decouples heavy metric calculations from synchronous API calls
+- **Features**: 
+  - Encrypted message storage
+  - 15-minute visibility timeout for long-running calculations
+  - Automatic retry handling
+
+### GraphQL Schema
+- **Location**: `src/api/schema.graphql`
+- **Operations**: `getTestSets`, `addTestSet`, `addTestSetFromUpload`, `deleteTestSets`, `getTestRuns`, `startTestRun`, `compareTestRuns`
+
+### Frontend Components
+
+#### TestStudioLayout
+- **Location**: `src/ui/src/components/test-studio/TestStudioLayout.jsx`
+- **Purpose**: Main container with two-tab navigation and global state management
+
+#### TestSets
+- **Location**: `src/ui/src/components/test-studio/TestSets.jsx`
+- **Purpose**: Manage test set collections
+- **Features**: Pattern-based creation, zip upload, direct upload detection, dual polling (3s active, 30s discovery)
+
+#### TestExecutions
+- **Location**: `src/ui/src/components/test-studio/TestExecutions.jsx`
+- **Purpose**: Unified interface combining TestRunner and TestResultsList
+- **Features**: Test execution, results viewing, comparison, export, delete operations
+
+## Component Structure
+
+```
+components/
+└── test-studio/
+    ├── TestStudioLayout.jsx
+    ├── TestSets.jsx
+    ├── TestExecutions.jsx
+    ├── TestRunner.jsx
+    ├── TestResultsList.jsx
+    ├── TestResults.jsx
+    ├── TestComparison.jsx
+    ├── TestRunnerStatus.jsx
+    ├── DeleteTestModal.jsx
+    └── index.js
+```
+
+## Test Sets
+
+### Creating Test Sets
+1. **Pattern-based**: Define file patterns (e.g., `*.pdf`) with bucket type selection
+   - **Input Bucket**: Scan main processing bucket for matching files
+   - **Test Set Bucket**: Scan dedicated test set bucket for matching files
+2. **Zip Upload**: Upload zip containing `input/` and `baseline/` folders
+3. **Direct Upload**: Files uploaded directly to TestSetBucket are auto-detected
+
+### File Structure Requirements
+```
+my-test-set/
+├── input/
+│   ├── document1.pdf
+│   └── document2.pdf
+└── baseline/
+    ├── document1.pdf/
+    │   └── [ground truth files]
+    └── document2.pdf/
+        └── [ground truth files]
+```
+
+### Validation Rules
+- Each input file must have corresponding baseline folder
+- Baseline folder name must match input filename exactly
+- Status: COMPLETED (valid), FAILED (validation errors), PROCESSING (uploading)
+
+### Upload Methods
+1. **UI Zip Upload**: S3 event → Lambda extraction → Validation → Status update
+2. **Direct S3 Upload**: Detected via refresh button or automatic polling
+
+## Test Executions
+
+### Running Tests
+1. Select test set from dropdown
+2. Click "Run Test" (single test execution only)
+3. Monitor progress via TestRunnerStatus
+4. View results in integrated listing
+
+### Test States
+- **QUEUED**: File copying jobs queued in SQS
+- **RUNNING**: Files being copied and processed
+- **COMPLETED**: Test finished successfully
+- **FAILED**: Errors during processing
+
+### Results Management
+- Filter and paginate test runs
+- Multi-select for comparison
+- Navigate to detailed results view
+- Delete and export functionality
+
+## Key Features
+
+### Test Set Management
+- Reusable collections with file patterns across multiple buckets
+- Dual bucket support (Input Bucket and Test Set Bucket)
+- Zip upload with automatic extraction
+- Direct upload detection via dual polling
+- File structure validation with error reporting
+
+### Test Execution
+- Single test concurrency prevention
+- Real-time status monitoring
+- Global state persistence across navigation
+- SQS-based asynchronous processing
+
+### Results Analysis
+- Comprehensive metrics display
+- Side-by-side test comparison
+- Export capabilities
+- Integrated delete operations
@@ -18,7 +18,7 @@
 class DynamoDBError(Exception):
     """Custom exception for DynamoDB errors"""
 
-    def __init__(self, message: str, error_code: str = None):
+    def __init__(self, message: str, error_code: str | None = None):
         super().__init__(message)
         self.error_code = error_code
 
@@ -54,7 +54,7 @@ def __init__(self, table_name: Optional[str] = None, region: Optional[str] = Non
 
         try:
             self.dynamodb = boto3.resource("dynamodb", region_name=self.region)
-            self.table = self.dynamodb.Table(self.table_name)
+            self.table = self.dynamodb.Table(self.table_name)  # type: ignore[attr-defined]
         except Exception as e:
             logger.error(f"Failed to initialize DynamoDB client: {str(e)}")
             raise DynamoDBError(f"Failed to initialize DynamoDB client: {str(e)}")
@@ -134,6 +134,32 @@ def update_item(
             logger.error(f"BotoCore error during update_item: {str(e)}")
             raise DynamoDBError(f"BotoCore error: {str(e)}")
 
+    def delete_item(self, key: Dict[str, Any]) -> Dict[str, Any]:
+        """
+        Delete an item from the DynamoDB table.
+
+        Args:
+            key: The primary key of the item to delete
+
+        Returns:
+            Dict containing the delete response
+
+        Raises:
+            DynamoDBError: If the DynamoDB operation fails
+        """
+        try:
+            response = self.table.delete_item(Key=key)
+            logger.debug(f"Successfully deleted item with key: {key}")
+            return response
+        except ClientError as e:
+            error_code = e.response["Error"]["Code"]
+            error_message = e.response["Error"]["Message"]
+            logger.error(f"DynamoDB delete_item failed: {error_code} - {error_message}")
+            raise DynamoDBError(f"Delete failed: {error_message}", error_code)
+        except BotoCoreError as e:
+            logger.error(f"BotoCore error during delete_item: {str(e)}")
+            raise DynamoDBError(f"BotoCore error: {str(e)}")
+
     def get_item(self, key: Dict[str, Any]) -> Optional[Dict[str, Any]]:
         """
         Get an item from the DynamoDB table.
@@ -266,6 +292,68 @@ def scan(
             logger.error(f"BotoCore error during scan: {str(e)}")
             raise DynamoDBError(f"BotoCore error: {str(e)}")
 
+    def scan_all(
+        self,
+        filter_expression: Optional[str] = None,
+        expression_attribute_names: Optional[Dict[str, str]] = None,
+        expression_attribute_values: Optional[Dict[str, Any]] = None,
+    ) -> List[Dict[str, Any]]:
+        """
+        Scan the DynamoDB table with automatic pagination to retrieve all items.
+
+        Args:
+            filter_expression: Optional filter expression
+            expression_attribute_names: Optional attribute name mappings
+            expression_attribute_values: Optional attribute value mappings
+
+        Returns:
+            List of all items matching the filter
+
+        Raises:
+            DynamoDBError: If the DynamoDB operation fails
+        """
+        items = []
+        last_evaluated_key = None
+
+        while True:
+            scan_params = {}
+
+            if filter_expression:
+                scan_params["FilterExpression"] = filter_expression
+
+            if expression_attribute_names:
+                scan_params["ExpressionAttributeNames"] = expression_attribute_names
+
+            if expression_attribute_values:
+                scan_params["ExpressionAttributeValues"] = expression_attribute_values
+
+            if last_evaluated_key:
+                scan_params["ExclusiveStartKey"] = last_evaluated_key
+
+            try:
+                response = self.table.scan(**scan_params)
+                items.extend(response.get("Items", []))
+
+                last_evaluated_key = response.get("LastEvaluatedKey")
+                if not last_evaluated_key:
+                    break
+
+            except ClientError as e:
+                error_code = e.response["Error"]["Code"]
+                error_message = e.response["Error"]["Message"]
+                logger.error(
+                    f"DynamoDB scan_all failed: {error_code} - {error_message}"
+                )
+                raise DynamoDBError(f"Scan failed: {error_message}", error_code)
+            except BotoCoreError as e:
+                logger.error(f"BotoCore error during scan_all: {str(e)}")
+                raise DynamoDBError(f"BotoCore error: {str(e)}")
+
+        logger.debug(
+            f"Successfully scanned all items, returned {len(items)} total items"
+        )
+        return items
+
     def query(
         self,
         key_condition_expression: str,
@@ -291,7 +379,7 @@ def query(
             DynamoDBError: If the DynamoDB operation fails
         """
         try:
-            query_params = {
+            query_params: Dict[str, Any] = {
                 "KeyConditionExpression": key_condition_expression,
             }
 
 
@@ -242,3 +242,40 @@ def _list_local_images(directory_path: str, image_extensions: set) -> List[str]:
     except Exception as e:
         logger.error(f"Error listing images from local directory {directory_path}: {e}")
         raise
+
+def find_matching_files(bucket: str, pattern: str) -> List[str]:
+    """
+    Find files in S3 bucket that match a given pattern.
+    
+    Args:
+        bucket: S3 bucket name
+        pattern: File pattern with wildcards (* and ?) - case sensitive, * doesn't match /
+        
+    Returns:
+        List of matching file keys
+    """
+    import re
+    
+    try:
+        s3 = get_s3_client()
+        paginator = s3.get_paginator('list_objects_v2')
+        
+        # Convert pattern: * matches anything except /, ? matches single char except /
+        regex_pattern = pattern.replace('*', '[^/]*').replace('?', '[^/]')
+        regex = re.compile(f'^{regex_pattern}$')
+        
+        matching_files = []
+        
+        for page in paginator.paginate(Bucket=bucket):
+            if 'Contents' in page:
+                for obj in page['Contents']:
+                    key = obj['Key']
+                    if regex.match(key):
+                        matching_files.append(key)
+        
+        logger.info(f"Found {len(matching_files)} files matching pattern '{pattern}' in bucket '{bucket}'")
+        return sorted(matching_files)
+        
+    except Exception as e:
+        logger.error(f"Error finding matching files in bucket {bucket} with pattern {pattern}: {e}")
+        raise
@@ -0,0 +1,57 @@
+# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
+# SPDX-License-Identifier: MIT-0
+
+import pytest
+
+
+@pytest.mark.unit
+def test_list_to_dict_conversion():
+    """Test conversion of list to dict for comparison functions"""
+    # Simulate the data structure conversion used in comparison functions
+    results_list = [
+        {"testRunId": "test1", "overallAccuracy": 85, "totalCost": 1.50},
+        {"testRunId": "test2", "overallAccuracy": 90, "totalCost": 2.00},
+    ]
+
+    # Convert list to dict with testRunId as key
+    results_dict = {result["testRunId"]: result for result in results_list}
+
+    assert len(results_dict) == 2
+    assert "test1" in results_dict
+    assert "test2" in results_dict
+    assert results_dict["test1"]["overallAccuracy"] == 85
+    assert results_dict["test2"]["totalCost"] == 2.00
+
+
+@pytest.mark.unit
+def test_metrics_comparison_structure():
+    """Test metrics comparison data structure"""
+    results_dict = {
+        "test1": {"overallAccuracy": 85, "averageConfidence": 75, "totalCost": 1.50},
+        "test2": {"overallAccuracy": 90, "averageConfidence": 80, "totalCost": 2.00},
+    }
+
+    # Simulate _build_metrics_comparison logic
+    metrics = [
+        {
+            "metric": "Overall Accuracy",
+            "values": {
+                k: f"{v.get('overallAccuracy', 0)}%" for k, v in results_dict.items()
+            },
+        },
+        {
+            "metric": "Average Confidence",
+            "values": {
+                k: f"{v.get('averageConfidence', 0)}%" for k, v in results_dict.items()
+            },
+        },
+        {
+            "metric": "Total Cost",
+            "values": {k: f"${v.get('totalCost', 0)}" for k, v in results_dict.items()},
+        },
+    ]
+
+    assert len(metrics) == 3
+    assert metrics[0]["metric"] == "Overall Accuracy"
+    assert metrics[0]["values"]["test1"] == "85%"
+    assert metrics[2]["values"]["test2"] == "$2.0"