Prompt Tester - Architecture Overview

Project Purpose

A bash-based framework for testing LLM prompts with support for dual-format test cases (YAML/JSON), automated assertions, metrics tracking, and multiple output reporters.

High-Level Architecture

┌─────────────────────────────────────────────────────────────────┐
│                         prompt-tester (bin/)                     │
│                    Main Entry Point & CLI                        │
└─────────────────────────────────────────────────────────────────┘
                               │
                               ▼
┌─────────────────────────────────────────────────────────────────┐
│                      lib/parser/                                │
│         ┌──────────────┐  ┌──────────────┐  ┌──────────────┐    │
│         │ YAML Parser  │  │ JSON Parser  │  │  Load Case   │    │
│         │   (yq)       │  │   (jq)       │  │  (Auto-Detect)│   │
│         └──────────────┘  └──────────────┘  └──────────────┘    │
└─────────────────────────────────────────────────────────────────┘
                               │
                               ▼
┌─────────────────────────────────────────────────────────────────┐
│                      lib/runner/                                │
│         ┌──────────────┐  ┌──────────────┐  ┌──────────────┐    │
│         │ Execute Test │  │   Aggregate  │  │   Parallel   │    │
│         │   (LLM Call) │  │   Results    │  │  Execution   │    │
│         └──────────────┘  └──────────────┘  └──────────────┘    │
└─────────────────────────────────────────────────────────────────┘
                               │
                               ▼
┌─────────────────────────────────────────────────────────────────┐
│                   lib/assertions/                               │
│         ┌──────────────┐  ┌──────────────┐  ┌──────────────┐    │
│         │   Regex      │  │   Equality   │  │  Substring   │    │
│         └──────────────┘  └──────────────┘  └──────────────┘    │
│         ┌──────────────┐  ┌──────────────┐  ┌──────────────┐    │
│         │   Metric     │  │  JSON        │  │              │    │
│         │  Validator   │  │ Structure    │  │              │    │
│         └──────────────┘  └──────────────┘                   │
└─────────────────────────────────────────────────────────────────┘
                               │
                               ▼
┌─────────────────────────────────────────────────────────────────┐
│                   lib/metrics/                                  │
│         ┌──────────────┐  ┌──────────────┐                      │
│         │  Execution   │  │   Pass/Fail  │                      │
│         │    Time      │  │  Calculator  │                      │
│         └──────────────┘  └──────────────┘                      │
└─────────────────────────────────────────────────────────────────┘
                               │
                               ▼
┌─────────────────────────────────────────────────────────────────┐
│                   reporters/                                    │
│         ┌──────────────┐  ┌──────────────┐  ┌──────────────┐    │
│         │   Console    │  │    JSON      │  │    JUnit     │    │
│         │   Reporter   │  │  Reporter    │  │   Reporter   │    │
│         └──────────────┘  └──────────────┘  └──────────────┘    │
└─────────────────────────────────────────────────────────────────┘

Core Components

1. Parser Layer (lib/parser/)

Purpose: Load and parse test cases from YAML or JSON format

Files:

load_test_case.sh - Unified loader with format auto-detection (based on file extension)
parse_yaml.sh - YAML parser using yq with pure bash fallback
parse_json.sh - JSON parser using jq

Test Case Schema:

test_suite:
  name: "Suite Name"
  test_cases:
    - id: "unique_id"
      name: "Human-readable description"
      input:
        prompt: "The actual LLM prompt template"
        parameters:
          param_name: "param_value"
      assertions:
        - type: "substring"
          value: "expected text"
        - type: "regex"
          value: "pattern"
        - type: "field_exists"
          field: "path.to.field"
        - type: "array_length"
          min: 1
          max: 10
        - type: "unique_values"
        - type: "metric"
          metric_name: "execution_time"
          operator: "<"
          threshold: 5.0
      expected_metrics:
        execution_time:
          max: 5.0

Key Design Decisions:

Format detection based on file extension (.yaml, .yml, .json)
YAML parser uses yq command (if available) with pure bash fallback
JSON parser uses jq for reliable parsing
Unified interface through load_test_case.sh regardless of format

2. Runner Layer (lib/runner/)

Purpose: Execute test cases, aggregate results, support parallel execution

Files:

execute_test_case.sh - Executes individual test case (includes LLM API call)
test_aggregator.sh - Aggregates results from multiple test runs
run_parallel_tests.sh - Runs tests concurrently for performance

Flow:

Parse test case
Substitute parameters into prompt template
Call LLM API (external)
Run assertions against output
Calculate metrics
Aggregate results

Key Design Decisions:

Parallel execution for faster test suites
Aggregation supports multiple test suite runs
Metrics tracking is integrated into execution flow

3. Assertions Layer (lib/assertions/)

Purpose: Validate LLM output against expected criteria

Assertion Types:

assertion_substring.sh - Check if output contains expected substring
assertion_regex.sh - Validate output matches regex pattern
assertion_equality.sh - Exact match validation
validate_metric.sh - Validate calculated metrics (execution time, etc.)
validate_json_structure.sh - Validate JSON structure/schema
assertion_json.sh - General JSON assertions
assertion_unique_values.sh - Verify unique values in arrays

Key Design Decisions:

Each assertion type has dedicated script
Return codes indicate pass (0) or fail (non-zero)
Assertion failures are logged but don't stop execution
Supports metric-based assertions (e.g., execution time < threshold)

4. Metrics Layer (lib/metrics/)

Purpose: Calculate and track performance metrics

Files:

calculate_execution_time.sh - Measure time from prompt to response
calculate_pass_fail.sh - Calculate pass/fail rates from test results
usage_example.sh - Example usage patterns

Metrics Tracked:

execution_time - Time to generate response
pass_rate - Percentage of assertions that passed
custom_metrics - Extensible metric framework

Key Design Decisions:

Metrics are calculated during test execution
Metrics can be used as assertions
Support for custom metric definitions

5. Reporters (reporters/)

Purpose: Generate human-readable and machine-parseable reports

Reporters:

console.sh - Terminal output with colors, progress bars, summary
json.sh - JSON output for programmatic consumption
junit.sh - JUnit XML format for CI/CD integration

Key Design Decisions:

Multiple output formats for different use cases
Console reporter provides interactive feedback
JSON reporter enables custom tooling
JUnit reporter integrates with existing CI/CD pipelines

6. Main Entry Point (bin/prompt-tester)

Purpose: CLI interface, orchestrates entire test flow

Key Functions:

Argument parsing (--test-case, --output, --reporter, --parallel, etc.)
Test suite discovery and loading
Execution orchestration
Report generation and output

CLI Usage:

./bin/prompt-tester --test-case test.yaml --reporter console
./bin/prompt-tester --test-case tests/*.yaml --reporter junit --output results.xml
./bin/prompt-tester --test-case suite.yaml --parallel --reporter json

Data Flow

1. User invokes: prompt-tester --test-case <file>
   │
   ▼
2. bin/prompt-tester parses arguments
   │
   ▼
3. load_test_case.sh detects format and parses test case
   │
   ▼
4. execute_test_case.sh runs:
   ├── Substitute parameters into prompt
   ├── Call LLM API (external)
   ├── Run all assertions
   ├── Calculate metrics
   └── Store results
   │
   ▼
5. test_aggregator.sh aggregates results (if multiple runs)
   │
   ▼
6. Selected reporter generates output:
   ├── console.sh → terminal
   ├── json.sh → JSON file
   └── junit.sh → JUnit XML
   │
   ▼
7. Exit with appropriate status code

Test Case Format Evolution

Original Plan (PLAN.md)

YAML-based test cases with test_suite → test_cases structure
Support for input, assertions, expected_metrics
Multiple assertion types (substring, regex, field_exists, array_length)

Current Implementation

Dual-format support (YAML and JSON)
Enhanced schema with parameters substitution
Metric validation as assertions
JSON structure validation
Parallel execution support
Multiple reporter options

Migration Path

Current bash implementation is designed as a stepping stone toward a Go-based system with:

Improved performance
Better type safety
Enhanced parallelism
More robust error handling

Implementation Status

Completed Components

✅ Main entry point (bin/prompt-tester)
✅ YAML parser with yq + bash fallback
✅ JSON parser (jq)
✅ Unified test case loader
✅ Console reporter
✅ JSON reporter
✅ JUnit reporter
✅ Substring assertions
✅ Regex assertions
✅ Metric validation

Pending Components (from TODO.md)

⏳ Execute test case runner
⏳ Test result aggregator
⏳ Parallel execution
⏳ Equality assertions
⏳ JSON structure validation
⏳ Metrics calculation
⏳ Full test suite integration

Key Design Principles

Modularity: Clear separation of concerns across layers
Extensibility: Easy to add new assertion types, reporters, or metrics
Format Flexibility: Support both YAML and JSON test case formats
CI/CD Friendly: JUnit output, exit codes, structured logging
Migration Path: Bash implementation designed to inform Go version

Technology Stack

Language: Bash 4.0+
Parsing: yq (YAML), jq (JSON)
Execution: External LLM API calls
Reporting: Console (colors), JSON, JUnit XML

Integration Points

LLM APIs: Called from execute_test_case.sh (external dependency)
CI/CD: JUnit reporter for integration with Jenkins, GitHub Actions, etc.
Custom Tooling: JSON reporter for custom dashboards/analysis
Test Data: testdata/ directory for sample test cases

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prompt Tester - Architecture Overview

Project Purpose

High-Level Architecture

Core Components

1. Parser Layer (lib/parser/)

2. Runner Layer (lib/runner/)

3. Assertions Layer (lib/assertions/)

4. Metrics Layer (lib/metrics/)

5. Reporters (reporters/)

6. Main Entry Point (bin/prompt-tester)

Data Flow

Test Case Format Evolution

Original Plan (PLAN.md)

Current Implementation

Migration Path

Implementation Status

Completed Components

Pending Components (from TODO.md)

Key Design Principles

Technology Stack

Integration Points

FilesExpand file tree

architecture.md

Latest commit

History

architecture.md

File metadata and controls

Prompt Tester - Architecture Overview

Project Purpose

High-Level Architecture

Core Components

1. Parser Layer (lib/parser/)

2. Runner Layer (lib/runner/)

3. Assertions Layer (lib/assertions/)

4. Metrics Layer (lib/metrics/)

5. Reporters (reporters/)

6. Main Entry Point (bin/prompt-tester)

Data Flow

Test Case Format Evolution

Original Plan (PLAN.md)

Current Implementation

Migration Path

Implementation Status

Completed Components

Pending Components (from TODO.md)

Key Design Principles

Technology Stack

Integration Points