Skip to content

nwneisen/agent-tests

Repository files navigation

Prompt Tester

A bash-based framework for testing LLM prompts with comprehensive assertions and reporting capabilities.

Overview

Prompt Tester provides a lightweight approach to validating LLM prompt outputs using traditional testing principles. It supports YAML and JSON test case formats, multiple assertion types, and various output reporters.

Installation

Prerequisites

  • Bash 4.0+
  • jq - JSON processing tool
  • yq - YAML processing tool
  • opencode CLI (for actual prompt execution, optional)

Install dependencies:

# Ubuntu/Debian
sudo apt-get install jq yq

# macOS (with Homebrew)
brew install jq yq

# RHEL/CentOS
sudo yum install jq yq

Quick Start

Running Your First Test

# Run a test suite with console output
./bin/prompt-tester -f testdata/test_prompt_sample.yaml

# Run with verbose output
./bin/prompt-tester -f testdata/test_prompt_sample.yaml -v

# Run and save results to a file
./bin/prompt-tester -f testdata/test_prompt_sample.yaml -o results.json

Usage

Command-Line Options

Usage: ./bin/prompt-tester [OPTIONS] --test-file <file>

Options:
  -f, --test-file <file>   Path to test case file (YAML or JSON)
  -r, --reporter <type>    Output reporter type: console, json, junit
                           (default: console)
  -v, --verbose            Enable verbose output for detailed logging
  -o, --output <file>      Output file path for results
  -h, --help               Display this help message

Examples

# Basic test run with console output
./bin/prompt-tester -f tests.yaml

# Run with JSON reporter and save to file
./bin/prompt-tester -f tests.json -r json -o results.json

# Run with JUnit reporter for CI/CD integration
./bin/prompt-tester -f tests.yaml -r junit -o test-results.xml

# Verbose mode with console output
./bin/prompt-tester -f tests.yaml -v -r console

Test Case Format

The framework supports two test case formats. The new simplified format is recommended for most use cases, while the old format is still supported for backward compatibility.

New Simplified Format (Recommended)

The new format simplifies test case creation with a flat structure and per-file agent specification.

YAML Format

agent: <agent-name>

test_cases:
  - description: "Descriptive test name"
    prompt: "Your prompt text here"
    expected: "Expected response"

Key features:

  • agent field at root level specifies which agent to use
  • Flat test case structure with description, prompt, and expected fields
  • Exact string matching by default
  • Regex patterns enclosed in forward slashes: /pattern/
  • Whitespace is preserved during comparison

JSON Format

{
  "agent": "<agent-name>",
  "test_cases": [
    {
      "description": "Descriptive test name",
      "prompt": "Your prompt text here",
      "expected": "Expected response"
    }
  ]
}

Assertion Types

Exact String Match:

  - description: "Greeting test"
    prompt: "hello"
    expected: "hello world!"

Regex Pattern Match:

  - description: "Greeting variations"
    prompt: "hi"
    expected: "/^hello|hi|hey/"

Example Test File

agent: test

test_cases:
  - description: "Basic greeting test"
    prompt: "hello"
    expected: "hello world!"
  
  - description: "Regex test for numbers"
    prompt: "what number"
    expected: "/^[0-9]+$/"

See tests/README.md for more examples and documentation.

Old Format (Deprecated but Supported)

The original format with nested structure is still supported for backward compatibility.

YAML Format

test_suite:
  name: "My Test Suite"
  description: "Description of test suite"
  version: "1.0.0"

test_cases:
  - id: "TEST-001"
    name: "Descriptive test name"
    input:
      prompt: "Your prompt text here"
      parameters:
        temperature: 0.7
        max_tokens: 1000
    assertions:
      - type: substring
        value: "expected text"
        description: "Must contain this text"
      - type: regex
        value: "^[A-Z].*[.!?]"
        description: "Must match this pattern"
    expected_metrics:
      execution_time_ms: "< 1000"
      token_count: "< 500"

JSON Format

{
  "agent": "<agent-name>",
  "test_cases": [
    {
      "description": "Descriptive test name",
      "prompt": "Your prompt text here",
      "expected": "Expected response"
    }
  ]
}

Old Format (Deprecated but Supported):

{
  "test_suite": {
    "name": "My Test Suite",
    "description": "Description of test suite",
    "version": "1.0.0"
  },
  "test_cases": [
    {
      "id": "TEST-001",
      "name": "Descriptive test name",
      "input": {
        "prompt": "Your prompt text here",
        "parameters": {
          "temperature": 0.7,
          "max_tokens": 1000
        }
      },
      "assertions": [
        {
          "type": "substring",
          "value": "expected text",
          "description": "Must contain this text"
        },
        {
          "type": "regex",
          "value": "^[A-Z].*[.!?]",
          "description": "Must match this pattern"
        }
      ],
      "expected_metrics": {
        "execution_time_ms": "< 1000",
        "token_count": "< 500"
      }
    }
  ]
}

Migration from Old Format

The new simplified format is recommended for new test files. To migrate from the old format:

  1. Move test_suite.agent to root level agent field
  2. Flatten input.prompt to top-level prompt field
  3. Change expected_outputs.response to expected field
  4. Remove id, name, and assertions fields (using simplified structure)

Example Migration:

Old Format:

test_suite:
  agent: test
  name: "My Tests"

test_cases:
  - id: "TEST-001"
    name: "Greeting test"
    input:
      prompt: "hello"
    expected_outputs:
      response: "hello world!"

New Format:

agent: test

test_cases:
  - description: "Greeting test"
    prompt: "hello"
    expected: "hello world!"

The old format continues to work but is deprecated.

Assertion Types

The framework supports multiple assertion types for validating LLM outputs:

String Assertions

equal

Checks for exact string match between actual and expected output.

assertions:
  - type: equal
    value: "exact expected output"

substring

Checks if the actual output contains the specified substring.

assertions:
  - type: substring
    value: "must contain this text"

Pattern Assertions

regex

Validates if output matches a regular expression pattern.

assertions:
  - type: regex
    value: "^[A-Z][a-z]+[.!?]"

Structural Assertions

field_exists

Checks if a specified field exists in structured (JSON) output.

assertions:
  - type: field_exists
    field: response.data.id

array_length

Validates that an array has a length within specified bounds.

assertions:
  - type: array_length
    field: response.items
    min: 5
    max: 15

unique_values

Validates that all values in an array are unique.

assertions:
  - type: unique_values
    field: response.tags

Metric Assertions

metric

Validates computed metrics against threshold conditions.

expected_metrics:
  execution_time_ms: "< 1000"
  token_count: "< 500"

Supported operators: <, <=, >, >=, ==

Output Reporters

Console Reporter (Default)

Color-coded console output with summary statistics.

./bin/prompt-tester -f tests.yaml -r console

Output:

=========================================
TEST SUITE RESULTS
=========================================
Total Tests: 5
Passed: 4
Failed: 1
=========================================
Status: SOME TESTS FAILED

JSON Reporter

Structured JSON output suitable for programmatic consumption.

./bin/prompt-tester -f tests.yaml -r json -o results.json

Output (results.json):

{
  "suite": {
    "name": "My Test Suite",
    "execution_time": "2026-03-13T10:30:00Z"
  },
  "summary": {
    "total_tests": 5,
    "passed": 4,
    "failed": 1,
    "pass_rate": 0.80
  },
  "test_results": [...]
}

JUnit XML Reporter

JUnit-compatible XML output for CI/CD integration.

./bin/prompt-tester -f tests.yaml -r junit -o test-results.xml

Output (test-results.xml):

<?xml version="1.0" encoding="UTF-8"?>
<testsuites>
  <testsuite name="My Test Suite" tests="5" failures="1">
    <testcase name="TEST-001" status="passed"/>
    <testcase name="TEST-002" status="failed">
      <failure message="Assertion failed"/>
    </testcase>
  </testsuite>
</testsuites>

Environment Variables

  • PROMPT_TESTER_DEBUG - Set to 1 to enable debug mode
  • PROMPT_TESTER_TIMEOUT - Set test execution timeout in seconds (default: 60)

Project Structure

.
├── bin/
│   └── prompt-tester          # Main entry point
├── lib/
│   ├── parser/                # Test case parsers (YAML/JSON)
│   ├── assertions/            # Assertion validation functions
│   ├── metrics/               # Metrics calculation functions
│   └── runner/                # Test execution and aggregation
├── reporters/
│   ├── console.sh             # Console reporter
│   ├── json.sh                # JSON reporter
│   └── junit.sh               # JUnit XML reporter
├── testdata/                  # Sample test cases
└── README.md                  # This file

Creating Test Cases

Step 1: Write Your Test

Create a test case file (.yaml or .json) using the new simplified format:

agent: test

test_cases:
  - description: "Break down complex task"
    prompt: "Break down the following task into smaller, actionable subtasks..."
    expected: "/subtask.*actionable/"

For more examples, see tests/README.md which includes:

  • tests/basic.yaml - Simple exact match tests
  • tests/regex.yaml - Regex pattern matching examples
  • tests/edge_cases.yaml - Edge cases (empty strings, special characters, whitespace)

Step 2: Run the Test

./bin/prompt-tester -f my-tests.yaml -v

Step 3: Review Results

Check the console output or saved results file for detailed assertion results and test status.

Best Practices

  1. Use Descriptive IDs: Give test cases unique, descriptive IDs for easy reference
  2. Add Descriptions: Include descriptions for assertions to understand failure reasons
  3. Set Reasonable Metrics: Define expected metrics to catch performance regressions
  4. Organize by Feature: Group related test cases in separate files
  5. Use Tags: Categorize tests with tags for selective running
  6. Version Your Suites: Include version numbers to track test suite changes

Troubleshooting

Test Not Running

  • Ensure test file path is correct
  • Check file format (YAML/JSON) is valid
  • Verify required dependencies (jq, yq) are installed

Assertion Failures

  • Review assertion type and value
  • Check if expected output matches actual LLM response
  • Use verbose mode (-v) for detailed error messages

Performance Issues

  • Increase timeout with PROMPT_TESTER_TIMEOUT environment variable
  • Check network connectivity for LLM API calls
  • Review assertion complexity for expensive operations

License

This project is provided as-is for internal testing and validation purposes.

Contributing

Contributions are welcome! Please ensure new features include appropriate test cases.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors