Prompt Tester

A bash-based framework for testing LLM prompts with comprehensive assertions and reporting capabilities.

Overview

Prompt Tester provides a lightweight approach to validating LLM prompt outputs using traditional testing principles. It supports YAML and JSON test case formats, multiple assertion types, and various output reporters.

Installation

Prerequisites

Bash 4.0+
jq - JSON processing tool
yq - YAML processing tool
opencode CLI (for actual prompt execution, optional)

Install dependencies:

# Ubuntu/Debian
sudo apt-get install jq yq

# macOS (with Homebrew)
brew install jq yq

# RHEL/CentOS
sudo yum install jq yq

Quick Start

Running Your First Test

# Run a test suite with console output
./bin/prompt-tester -f testdata/test_prompt_sample.yaml

# Run with verbose output
./bin/prompt-tester -f testdata/test_prompt_sample.yaml -v

# Run and save results to a file
./bin/prompt-tester -f testdata/test_prompt_sample.yaml -o results.json

Usage

Command-Line Options

Usage: ./bin/prompt-tester [OPTIONS] --test-file <file>

Options:
  -f, --test-file <file>   Path to test case file (YAML or JSON)
  -r, --reporter <type>    Output reporter type: console, json, junit
                           (default: console)
  -v, --verbose            Enable verbose output for detailed logging
  -o, --output <file>      Output file path for results
  -h, --help               Display this help message

Examples

# Basic test run with console output
./bin/prompt-tester -f tests.yaml

# Run with JSON reporter and save to file
./bin/prompt-tester -f tests.json -r json -o results.json

# Run with JUnit reporter for CI/CD integration
./bin/prompt-tester -f tests.yaml -r junit -o test-results.xml

# Verbose mode with console output
./bin/prompt-tester -f tests.yaml -v -r console

Test Case Format

The framework supports two test case formats. The new simplified format is recommended for most use cases, while the old format is still supported for backward compatibility.

New Simplified Format (Recommended)

The new format simplifies test case creation with a flat structure and per-file agent specification.

YAML Format

agent: <agent-name>

test_cases:
  - description: "Descriptive test name"
    prompt: "Your prompt text here"
    expected: "Expected response"

Key features:

agent field at root level specifies which agent to use
Flat test case structure with description, prompt, and expected fields
Exact string matching by default
Regex patterns enclosed in forward slashes: /pattern/
Whitespace is preserved during comparison

JSON Format

{
  "agent": "<agent-name>",
  "test_cases": [
    {
      "description": "Descriptive test name",
      "prompt": "Your prompt text here",
      "expected": "Expected response"
    }
  ]
}

Assertion Types

Exact String Match:

  - description: "Greeting test"
    prompt: "hello"
    expected: "hello world!"

Regex Pattern Match:

  - description: "Greeting variations"
    prompt: "hi"
    expected: "/^hello|hi|hey/"

Example Test File

agent: test

test_cases:
  - description: "Basic greeting test"
    prompt: "hello"
    expected: "hello world!"
  
  - description: "Regex test for numbers"
    prompt: "what number"
    expected: "/^[0-9]+$/"

See tests/README.md for more examples and documentation.

Old Format (Deprecated but Supported)

The original format with nested structure is still supported for backward compatibility.

YAML Format

test_suite:
  name: "My Test Suite"
  description: "Description of test suite"
  version: "1.0.0"

test_cases:
  - id: "TEST-001"
    name: "Descriptive test name"
    input:
      prompt: "Your prompt text here"
      parameters:
        temperature: 0.7
        max_tokens: 1000
    assertions:
      - type: substring
        value: "expected text"
        description: "Must contain this text"
      - type: regex
        value: "^[A-Z].*[.!?]"
        description: "Must match this pattern"
    expected_metrics:
      execution_time_ms: "< 1000"
      token_count: "< 500"

JSON Format

{
  "agent": "<agent-name>",
  "test_cases": [
    {
      "description": "Descriptive test name",
      "prompt": "Your prompt text here",
      "expected": "Expected response"
    }
  ]
}

Old Format (Deprecated but Supported):

{
  "test_suite": {
    "name": "My Test Suite",
    "description": "Description of test suite",
    "version": "1.0.0"
  },
  "test_cases": [
    {
      "id": "TEST-001",
      "name": "Descriptive test name",
      "input": {
        "prompt": "Your prompt text here",
        "parameters": {
          "temperature": 0.7,
          "max_tokens": 1000
        }
      },
      "assertions": [
        {
          "type": "substring",
          "value": "expected text",
          "description": "Must contain this text"
        },
        {
          "type": "regex",
          "value": "^[A-Z].*[.!?]",
          "description": "Must match this pattern"
        }
      ],
      "expected_metrics": {
        "execution_time_ms": "< 1000",
        "token_count": "< 500"
      }
    }
  ]
}

Migration from Old Format

The new simplified format is recommended for new test files. To migrate from the old format:

Move test_suite.agent to root level agent field
Flatten input.prompt to top-level prompt field
Change expected_outputs.response to expected field
Remove id, name, and assertions fields (using simplified structure)

Example Migration:

Old Format:

test_suite:
  agent: test
  name: "My Tests"

test_cases:
  - id: "TEST-001"
    name: "Greeting test"
    input:
      prompt: "hello"
    expected_outputs:
      response: "hello world!"

New Format:

agent: test

test_cases:
  - description: "Greeting test"
    prompt: "hello"
    expected: "hello world!"

The old format continues to work but is deprecated.

Assertion Types

The framework supports multiple assertion types for validating LLM outputs:

String Assertions

`equal`

Checks for exact string match between actual and expected output.

assertions:
  - type: equal
    value: "exact expected output"

`substring`

Checks if the actual output contains the specified substring.

assertions:
  - type: substring
    value: "must contain this text"

Pattern Assertions

`regex`

Validates if output matches a regular expression pattern.

assertions:
  - type: regex
    value: "^[A-Z][a-z]+[.!?]"

Structural Assertions

`field_exists`

Checks if a specified field exists in structured (JSON) output.

assertions:
  - type: field_exists
    field: response.data.id

`array_length`

Validates that an array has a length within specified bounds.

assertions:
  - type: array_length
    field: response.items
    min: 5
    max: 15

`unique_values`

Validates that all values in an array are unique.

assertions:
  - type: unique_values
    field: response.tags

Metric Assertions

`metric`

Validates computed metrics against threshold conditions.

expected_metrics:
  execution_time_ms: "< 1000"
  token_count: "< 500"

Supported operators: <, <=, >, >=, ==

Output Reporters

Console Reporter (Default)

Color-coded console output with summary statistics.

./bin/prompt-tester -f tests.yaml -r console

Output:

=========================================
TEST SUITE RESULTS
=========================================
Total Tests: 5
Passed: 4
Failed: 1
=========================================
Status: SOME TESTS FAILED

JSON Reporter

Structured JSON output suitable for programmatic consumption.

./bin/prompt-tester -f tests.yaml -r json -o results.json

Output (results.json):

{
  "suite": {
    "name": "My Test Suite",
    "execution_time": "2026-03-13T10:30:00Z"
  },
  "summary": {
    "total_tests": 5,
    "passed": 4,
    "failed": 1,
    "pass_rate": 0.80
  },
  "test_results": [...]
}

JUnit XML Reporter

JUnit-compatible XML output for CI/CD integration.

./bin/prompt-tester -f tests.yaml -r junit -o test-results.xml

Output (test-results.xml):

<?xml version="1.0" encoding="UTF-8"?>
<testsuites>
  <testsuite name="My Test Suite" tests="5" failures="1">
    <testcase name="TEST-001" status="passed"/>
    <testcase name="TEST-002" status="failed">
      <failure message="Assertion failed"/>
    </testcase>
  </testsuite>
</testsuites>

Environment Variables

PROMPT_TESTER_DEBUG - Set to 1 to enable debug mode
PROMPT_TESTER_TIMEOUT - Set test execution timeout in seconds (default: 60)

Project Structure

.
├── bin/
│   └── prompt-tester          # Main entry point
├── lib/
│   ├── parser/                # Test case parsers (YAML/JSON)
│   ├── assertions/            # Assertion validation functions
│   ├── metrics/               # Metrics calculation functions
│   └── runner/                # Test execution and aggregation
├── reporters/
│   ├── console.sh             # Console reporter
│   ├── json.sh                # JSON reporter
│   └── junit.sh               # JUnit XML reporter
├── testdata/                  # Sample test cases
└── README.md                  # This file

Creating Test Cases

Step 1: Write Your Test

Create a test case file (.yaml or .json) using the new simplified format:

agent: test

test_cases:
  - description: "Break down complex task"
    prompt: "Break down the following task into smaller, actionable subtasks..."
    expected: "/subtask.*actionable/"

For more examples, see tests/README.md which includes:

tests/basic.yaml - Simple exact match tests
tests/regex.yaml - Regex pattern matching examples
tests/edge_cases.yaml - Edge cases (empty strings, special characters, whitespace)

Step 2: Run the Test

./bin/prompt-tester -f my-tests.yaml -v

Step 3: Review Results

Check the console output or saved results file for detailed assertion results and test status.

Best Practices

Use Descriptive IDs: Give test cases unique, descriptive IDs for easy reference
Add Descriptions: Include descriptions for assertions to understand failure reasons
Set Reasonable Metrics: Define expected metrics to catch performance regressions
Organize by Feature: Group related test cases in separate files
Use Tags: Categorize tests with tags for selective running
Version Your Suites: Include version numbers to track test suite changes

Troubleshooting

Test Not Running

Ensure test file path is correct
Check file format (YAML/JSON) is valid
Verify required dependencies (jq, yq) are installed

Assertion Failures

Review assertion type and value
Check if expected output matches actual LLM response
Use verbose mode (-v) for detailed error messages

Performance Issues

Increase timeout with PROMPT_TESTER_TIMEOUT environment variable
Check network connectivity for LLM API calls
Review assertion complexity for expensive operations

License

This project is provided as-is for internal testing and validation purposes.

Contributing

Contributions are welcome! Please ensure new features include appropriate test cases.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
bin		bin
lib		lib
reporters		reporters
testdata		testdata
tests		tests
PLAN.md		PLAN.md
README.md		README.md
TODO.md		TODO.md
architecture.md		architecture.md
design.md		design.md
questions.md		questions.md
run_parser_tests.sh		run_parser_tests.sh
test_parser.sh		test_parser.sh
yq		yq

Folders and files

Latest commit

History

Repository files navigation

Prompt Tester

Overview

Installation

Prerequisites

Quick Start

Running Your First Test

Usage

Command-Line Options

Examples

Test Case Format

New Simplified Format (Recommended)

YAML Format

JSON Format

Assertion Types

Example Test File

Old Format (Deprecated but Supported)

YAML Format

JSON Format

Migration from Old Format

Assertion Types

String Assertions

equal

substring

Pattern Assertions

regex

Structural Assertions

field_exists

array_length

unique_values

Metric Assertions

metric

Output Reporters

Console Reporter (Default)

JSON Reporter

JUnit XML Reporter

Environment Variables

Project Structure

Creating Test Cases

Step 1: Write Your Test

Step 2: Run the Test

Step 3: Review Results

Best Practices

Troubleshooting

Test Not Running

Assertion Failures

Performance Issues

License

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`equal`

`substring`

`regex`

`field_exists`

`array_length`

`unique_values`

`metric`

Packages