A bash-based framework for testing LLM prompts with support for dual-format test cases (YAML/JSON), automated assertions, metrics tracking, and multiple output reporters.
┌─────────────────────────────────────────────────────────────────┐
│ prompt-tester (bin/) │
│ Main Entry Point & CLI │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ lib/parser/ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ YAML Parser │ │ JSON Parser │ │ Load Case │ │
│ │ (yq) │ │ (jq) │ │ (Auto-Detect)│ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ lib/runner/ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Execute Test │ │ Aggregate │ │ Parallel │ │
│ │ (LLM Call) │ │ Results │ │ Execution │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ lib/assertions/ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Regex │ │ Equality │ │ Substring │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Metric │ │ JSON │ │ │ │
│ │ Validator │ │ Structure │ │ │ │
│ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ lib/metrics/ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Execution │ │ Pass/Fail │ │
│ │ Time │ │ Calculator │ │
│ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ reporters/ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Console │ │ JSON │ │ JUnit │ │
│ │ Reporter │ │ Reporter │ │ Reporter │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Purpose: Load and parse test cases from YAML or JSON format
Files:
load_test_case.sh- Unified loader with format auto-detection (based on file extension)parse_yaml.sh- YAML parser using yq with pure bash fallbackparse_json.sh- JSON parser using jq
Test Case Schema:
test_suite:
name: "Suite Name"
test_cases:
- id: "unique_id"
name: "Human-readable description"
input:
prompt: "The actual LLM prompt template"
parameters:
param_name: "param_value"
assertions:
- type: "substring"
value: "expected text"
- type: "regex"
value: "pattern"
- type: "field_exists"
field: "path.to.field"
- type: "array_length"
min: 1
max: 10
- type: "unique_values"
- type: "metric"
metric_name: "execution_time"
operator: "<"
threshold: 5.0
expected_metrics:
execution_time:
max: 5.0Key Design Decisions:
- Format detection based on file extension (.yaml, .yml, .json)
- YAML parser uses yq command (if available) with pure bash fallback
- JSON parser uses jq for reliable parsing
- Unified interface through load_test_case.sh regardless of format
Purpose: Execute test cases, aggregate results, support parallel execution
Files:
execute_test_case.sh- Executes individual test case (includes LLM API call)test_aggregator.sh- Aggregates results from multiple test runsrun_parallel_tests.sh- Runs tests concurrently for performance
Flow:
- Parse test case
- Substitute parameters into prompt template
- Call LLM API (external)
- Run assertions against output
- Calculate metrics
- Aggregate results
Key Design Decisions:
- Parallel execution for faster test suites
- Aggregation supports multiple test suite runs
- Metrics tracking is integrated into execution flow
Purpose: Validate LLM output against expected criteria
Assertion Types:
assertion_substring.sh- Check if output contains expected substringassertion_regex.sh- Validate output matches regex patternassertion_equality.sh- Exact match validationvalidate_metric.sh- Validate calculated metrics (execution time, etc.)validate_json_structure.sh- Validate JSON structure/schemaassertion_json.sh- General JSON assertionsassertion_unique_values.sh- Verify unique values in arrays
Key Design Decisions:
- Each assertion type has dedicated script
- Return codes indicate pass (0) or fail (non-zero)
- Assertion failures are logged but don't stop execution
- Supports metric-based assertions (e.g., execution time < threshold)
Purpose: Calculate and track performance metrics
Files:
calculate_execution_time.sh- Measure time from prompt to responsecalculate_pass_fail.sh- Calculate pass/fail rates from test resultsusage_example.sh- Example usage patterns
Metrics Tracked:
execution_time- Time to generate responsepass_rate- Percentage of assertions that passedcustom_metrics- Extensible metric framework
Key Design Decisions:
- Metrics are calculated during test execution
- Metrics can be used as assertions
- Support for custom metric definitions
Purpose: Generate human-readable and machine-parseable reports
Reporters:
console.sh- Terminal output with colors, progress bars, summaryjson.sh- JSON output for programmatic consumptionjunit.sh- JUnit XML format for CI/CD integration
Key Design Decisions:
- Multiple output formats for different use cases
- Console reporter provides interactive feedback
- JSON reporter enables custom tooling
- JUnit reporter integrates with existing CI/CD pipelines
Purpose: CLI interface, orchestrates entire test flow
Key Functions:
- Argument parsing (--test-case, --output, --reporter, --parallel, etc.)
- Test suite discovery and loading
- Execution orchestration
- Report generation and output
CLI Usage:
./bin/prompt-tester --test-case test.yaml --reporter console
./bin/prompt-tester --test-case tests/*.yaml --reporter junit --output results.xml
./bin/prompt-tester --test-case suite.yaml --parallel --reporter json1. User invokes: prompt-tester --test-case <file>
│
▼
2. bin/prompt-tester parses arguments
│
▼
3. load_test_case.sh detects format and parses test case
│
▼
4. execute_test_case.sh runs:
├── Substitute parameters into prompt
├── Call LLM API (external)
├── Run all assertions
├── Calculate metrics
└── Store results
│
▼
5. test_aggregator.sh aggregates results (if multiple runs)
│
▼
6. Selected reporter generates output:
├── console.sh → terminal
├── json.sh → JSON file
└── junit.sh → JUnit XML
│
▼
7. Exit with appropriate status code
- YAML-based test cases with
test_suite→test_casesstructure - Support for
input,assertions,expected_metrics - Multiple assertion types (substring, regex, field_exists, array_length)
- Dual-format support (YAML and JSON)
- Enhanced schema with parameters substitution
- Metric validation as assertions
- JSON structure validation
- Parallel execution support
- Multiple reporter options
Current bash implementation is designed as a stepping stone toward a Go-based system with:
- Improved performance
- Better type safety
- Enhanced parallelism
- More robust error handling
- ✅ Main entry point (bin/prompt-tester)
- ✅ YAML parser with yq + bash fallback
- ✅ JSON parser (jq)
- ✅ Unified test case loader
- ✅ Console reporter
- ✅ JSON reporter
- ✅ JUnit reporter
- ✅ Substring assertions
- ✅ Regex assertions
- ✅ Metric validation
- ⏳ Execute test case runner
- ⏳ Test result aggregator
- ⏳ Parallel execution
- ⏳ Equality assertions
- ⏳ JSON structure validation
- ⏳ Metrics calculation
- ⏳ Full test suite integration
- Modularity: Clear separation of concerns across layers
- Extensibility: Easy to add new assertion types, reporters, or metrics
- Format Flexibility: Support both YAML and JSON test case formats
- CI/CD Friendly: JUnit output, exit codes, structured logging
- Migration Path: Bash implementation designed to inform Go version
- Language: Bash 4.0+
- Parsing: yq (YAML), jq (JSON)
- Execution: External LLM API calls
- Reporting: Console (colors), JSON, JUnit XML
- LLM APIs: Called from execute_test_case.sh (external dependency)
- CI/CD: JUnit reporter for integration with Jenkins, GitHub Actions, etc.
- Custom Tooling: JSON reporter for custom dashboards/analysis
- Test Data: testdata/ directory for sample test cases