A bash-based framework for testing LLM prompts with comprehensive assertions and reporting capabilities.
Prompt Tester provides a lightweight approach to validating LLM prompt outputs using traditional testing principles. It supports YAML and JSON test case formats, multiple assertion types, and various output reporters.
- Bash 4.0+
- jq - JSON processing tool
- yq - YAML processing tool
- opencode CLI (for actual prompt execution, optional)
Install dependencies:
# Ubuntu/Debian
sudo apt-get install jq yq
# macOS (with Homebrew)
brew install jq yq
# RHEL/CentOS
sudo yum install jq yq# Run a test suite with console output
./bin/prompt-tester -f testdata/test_prompt_sample.yaml
# Run with verbose output
./bin/prompt-tester -f testdata/test_prompt_sample.yaml -v
# Run and save results to a file
./bin/prompt-tester -f testdata/test_prompt_sample.yaml -o results.jsonUsage: ./bin/prompt-tester [OPTIONS] --test-file <file>
Options:
-f, --test-file <file> Path to test case file (YAML or JSON)
-r, --reporter <type> Output reporter type: console, json, junit
(default: console)
-v, --verbose Enable verbose output for detailed logging
-o, --output <file> Output file path for results
-h, --help Display this help message
# Basic test run with console output
./bin/prompt-tester -f tests.yaml
# Run with JSON reporter and save to file
./bin/prompt-tester -f tests.json -r json -o results.json
# Run with JUnit reporter for CI/CD integration
./bin/prompt-tester -f tests.yaml -r junit -o test-results.xml
# Verbose mode with console output
./bin/prompt-tester -f tests.yaml -v -r consoleThe framework supports two test case formats. The new simplified format is recommended for most use cases, while the old format is still supported for backward compatibility.
The new format simplifies test case creation with a flat structure and per-file agent specification.
agent: <agent-name>
test_cases:
- description: "Descriptive test name"
prompt: "Your prompt text here"
expected: "Expected response"Key features:
agentfield at root level specifies which agent to use- Flat test case structure with
description,prompt, andexpectedfields - Exact string matching by default
- Regex patterns enclosed in forward slashes:
/pattern/ - Whitespace is preserved during comparison
{
"agent": "<agent-name>",
"test_cases": [
{
"description": "Descriptive test name",
"prompt": "Your prompt text here",
"expected": "Expected response"
}
]
}Exact String Match:
- description: "Greeting test"
prompt: "hello"
expected: "hello world!"Regex Pattern Match:
- description: "Greeting variations"
prompt: "hi"
expected: "/^hello|hi|hey/"agent: test
test_cases:
- description: "Basic greeting test"
prompt: "hello"
expected: "hello world!"
- description: "Regex test for numbers"
prompt: "what number"
expected: "/^[0-9]+$/"See tests/README.md for more examples and documentation.
The original format with nested structure is still supported for backward compatibility.
test_suite:
name: "My Test Suite"
description: "Description of test suite"
version: "1.0.0"
test_cases:
- id: "TEST-001"
name: "Descriptive test name"
input:
prompt: "Your prompt text here"
parameters:
temperature: 0.7
max_tokens: 1000
assertions:
- type: substring
value: "expected text"
description: "Must contain this text"
- type: regex
value: "^[A-Z].*[.!?]"
description: "Must match this pattern"
expected_metrics:
execution_time_ms: "< 1000"
token_count: "< 500"{
"agent": "<agent-name>",
"test_cases": [
{
"description": "Descriptive test name",
"prompt": "Your prompt text here",
"expected": "Expected response"
}
]
}Old Format (Deprecated but Supported):
{
"test_suite": {
"name": "My Test Suite",
"description": "Description of test suite",
"version": "1.0.0"
},
"test_cases": [
{
"id": "TEST-001",
"name": "Descriptive test name",
"input": {
"prompt": "Your prompt text here",
"parameters": {
"temperature": 0.7,
"max_tokens": 1000
}
},
"assertions": [
{
"type": "substring",
"value": "expected text",
"description": "Must contain this text"
},
{
"type": "regex",
"value": "^[A-Z].*[.!?]",
"description": "Must match this pattern"
}
],
"expected_metrics": {
"execution_time_ms": "< 1000",
"token_count": "< 500"
}
}
]
}The new simplified format is recommended for new test files. To migrate from the old format:
- Move
test_suite.agentto root levelagentfield - Flatten
input.promptto top-levelpromptfield - Change
expected_outputs.responsetoexpectedfield - Remove
id,name, andassertionsfields (using simplified structure)
Example Migration:
Old Format:
test_suite:
agent: test
name: "My Tests"
test_cases:
- id: "TEST-001"
name: "Greeting test"
input:
prompt: "hello"
expected_outputs:
response: "hello world!"New Format:
agent: test
test_cases:
- description: "Greeting test"
prompt: "hello"
expected: "hello world!"The old format continues to work but is deprecated.
The framework supports multiple assertion types for validating LLM outputs:
Checks for exact string match between actual and expected output.
assertions:
- type: equal
value: "exact expected output"Checks if the actual output contains the specified substring.
assertions:
- type: substring
value: "must contain this text"Validates if output matches a regular expression pattern.
assertions:
- type: regex
value: "^[A-Z][a-z]+[.!?]"Checks if a specified field exists in structured (JSON) output.
assertions:
- type: field_exists
field: response.data.idValidates that an array has a length within specified bounds.
assertions:
- type: array_length
field: response.items
min: 5
max: 15Validates that all values in an array are unique.
assertions:
- type: unique_values
field: response.tagsValidates computed metrics against threshold conditions.
expected_metrics:
execution_time_ms: "< 1000"
token_count: "< 500"Supported operators: <, <=, >, >=, ==
Color-coded console output with summary statistics.
./bin/prompt-tester -f tests.yaml -r consoleOutput:
=========================================
TEST SUITE RESULTS
=========================================
Total Tests: 5
Passed: 4
Failed: 1
=========================================
Status: SOME TESTS FAILED
Structured JSON output suitable for programmatic consumption.
./bin/prompt-tester -f tests.yaml -r json -o results.jsonOutput (results.json):
{
"suite": {
"name": "My Test Suite",
"execution_time": "2026-03-13T10:30:00Z"
},
"summary": {
"total_tests": 5,
"passed": 4,
"failed": 1,
"pass_rate": 0.80
},
"test_results": [...]
}JUnit-compatible XML output for CI/CD integration.
./bin/prompt-tester -f tests.yaml -r junit -o test-results.xmlOutput (test-results.xml):
<?xml version="1.0" encoding="UTF-8"?>
<testsuites>
<testsuite name="My Test Suite" tests="5" failures="1">
<testcase name="TEST-001" status="passed"/>
<testcase name="TEST-002" status="failed">
<failure message="Assertion failed"/>
</testcase>
</testsuite>
</testsuites>PROMPT_TESTER_DEBUG- Set to1to enable debug modePROMPT_TESTER_TIMEOUT- Set test execution timeout in seconds (default: 60)
.
├── bin/
│ └── prompt-tester # Main entry point
├── lib/
│ ├── parser/ # Test case parsers (YAML/JSON)
│ ├── assertions/ # Assertion validation functions
│ ├── metrics/ # Metrics calculation functions
│ └── runner/ # Test execution and aggregation
├── reporters/
│ ├── console.sh # Console reporter
│ ├── json.sh # JSON reporter
│ └── junit.sh # JUnit XML reporter
├── testdata/ # Sample test cases
└── README.md # This file
Create a test case file (.yaml or .json) using the new simplified format:
agent: test
test_cases:
- description: "Break down complex task"
prompt: "Break down the following task into smaller, actionable subtasks..."
expected: "/subtask.*actionable/"For more examples, see tests/README.md which includes:
tests/basic.yaml- Simple exact match teststests/regex.yaml- Regex pattern matching examplestests/edge_cases.yaml- Edge cases (empty strings, special characters, whitespace)
./bin/prompt-tester -f my-tests.yaml -vCheck the console output or saved results file for detailed assertion results and test status.
- Use Descriptive IDs: Give test cases unique, descriptive IDs for easy reference
- Add Descriptions: Include descriptions for assertions to understand failure reasons
- Set Reasonable Metrics: Define expected metrics to catch performance regressions
- Organize by Feature: Group related test cases in separate files
- Use Tags: Categorize tests with tags for selective running
- Version Your Suites: Include version numbers to track test suite changes
- Ensure test file path is correct
- Check file format (YAML/JSON) is valid
- Verify required dependencies (jq, yq) are installed
- Review assertion type and value
- Check if expected output matches actual LLM response
- Use verbose mode (
-v) for detailed error messages
- Increase timeout with
PROMPT_TESTER_TIMEOUTenvironment variable - Check network connectivity for LLM API calls
- Review assertion complexity for expensive operations
This project is provided as-is for internal testing and validation purposes.
Contributions are welcome! Please ensure new features include appropriate test cases.