Test Case Format Improvement Plan

Overview

This plan defines an improved test case format for the existing prompt-tester framework. This is not a new project, but an evolution that simplifies the test case schema while leveraging the existing infrastructure (parsers, runners, reporters).

Key Goal: Simplify test case creation for opencode agent testing while maintaining compatibility with the existing framework architecture.

Current Format vs. Improved Format

Current Format (Complex)

test_suite:
  name: "Suite Name"
  test_cases:
    - id: "unique_id"
      name: "Human-readable description"
      input:
        prompt: "The actual LLM prompt template"
        parameters:
          param_name: "param_value"
      assertions:
        - type: "substring"
          value: "expected text"
        - type: "regex"
          value: "pattern"
        - type: "metric"
          metric_name: "execution_time"
          operator: "<"
          threshold: 5.0
      expected_metrics:
        execution_time:
          max: 5.0

Improved Format (Simplified)

agent: test  # Per-file agent specification

test_cases:
  - description: "Basic greeting test"
    prompt: "hello"
    expected: "hello world!"
    
  - description: "Regex test"
    prompt: "tell me about cats"
    expected: "/cat|feline/"

Why This is an Improvement

Simpler to write: No nested structures, just description, prompt, expected
Per-file agent: Single agent declaration at file level (not repeated per test case)
Straightforward assertions: Only exact match or regex (no complex assertion types needed for opencode)
Whitespace normalization: Built-in handling of whitespace differences
Faster iteration: Less boilerplate, quicker test case creation
Backward compatible: Uses existing parser, runner, and reporter infrastructure

Test File Format

Structure

Each test file contains:

agent: Specifies which opencode agent to use (required, per-file)
test_cases: Array of individual test cases

YAML Syntax

agent: test

test_cases:
  - description: "Basic greeting test"
    prompt: "hello"
    expected: "hello world!"
    
  - description: "Regex test"
    prompt: "tell me about cats"
    expected: "/cat|feline/"
    
  - description: "Complex regex"
    prompt: "generate JSON"
    expected: "/\\{.*\\}/"

Assertion Logic

String Comparison

Exact match between actual response and expected value (default)
Whitespace comparison: preserve original whitespace (spaces, tabs, newlines)
Future enhancement: integrate an agent to evaluate semantic equivalence of responses

Regex Support

expected field can be a plain string (exact match) or regex pattern
Regex patterns enclosed in forward slashes: /pattern/
If pattern doesn't match, test fails with regex error message

Error Output

If opencode CLI returns non-zero exit code → fail (include exit code)
If response doesn't match expected → fail with actual vs expected
Include test description in all error messages

Execution Flow

1. Load test file: Parse YAML to extract agent and test cases
2. For each test case:
   - Execute: opencode run --agent <agent> <prompt>
   - Capture stdout response
   - Compare response to expected (with whitespace normalization)
   - Mark test as pass/fail
3. Report results: Summary of passed/failed tests

File Organization

project/
├── tests/
│   ├── basic.yaml
│   ├── advanced.yaml
│   └── edge_cases.yaml
├── bin/
│   └── prompt-tester (existing, enhanced)
├── lib/
│   ├── parser/
│   │   ├── load_test_case.sh (existing, enhanced)
│   │   ├── parse_yaml.sh (existing, enhanced)
│   │   └── parse_json.sh (existing, enhanced)
│   ├── runner/
│   │   ├── execute_test_case.sh (new)
│   │   ├── test_aggregator.sh (existing, enhanced)
│   │   └── run_parallel_tests.sh (existing)
│   └── assertions/
│       ├── assertion_substring.sh (existing)
│       ├── assertion_regex.sh (existing)
│       ├── assertion_equality.sh (new/simplified)
│       └── compare.sh (new - whitespace normalization)
└── reporters/
    ├── console.sh (existing)
    ├── json.sh (existing)
    └── junit.sh (existing)

CLI Tool Integration

opencode Command

opencode run --agent <agent_name> <prompt>

Example

opencode run --agent test "hello"
# Output: "hello world!"

prompt-tester Usage (Enhanced)

# Run with new simplified format
./bin/prompt-tester --test-file tests/basic.yaml --reporter console

# Run with verbose output
./bin/prompt-tester -f tests/advanced.yaml -r console -v

# Run and save results
./bin/prompt-tester --test-file tests/all.yaml --reporter junit --output results.xml

Output Format

Console Reporter

Running tests...

✓ Basic greeting test
✗ Another test
  Expected: "expected response"
  Got:      "actual response"

Results: 2 passed, 1 failed (66.7%)

Note: Test description is used for display in reports.

Migration Notes

What Changes

Test case format (simplified schema)
Agent specification (per-file instead of per-test-case)
Assertion logic (simplified to exact/regex with whitespace normalization)

What Stays the Same

CLI interface and arguments
Reporter infrastructure (console, JSON, JUnit)
Aggregation and parallel execution support
Overall architecture and folder structure

Backward Compatibility

The existing complex format will continue to work (deprecated but supported)
New format is the recommended approach
Clear documentation distinguishes between the two formats

Assumptions

YAML parsing: Will use yq tool (YAML parser)
opencode CLI: Available in PATH, supports run --agent <name> <prompt>
Agent configuration: Already set up in opencode (e.g., "test" agent)
Response format: stdout contains the LLM response
Timeout: 10 minutes default for opencode responses
Whitespace: Preserved during comparison (no normalization)
File extension: .yaml extension enforced for test files
Future evaluation: An LLM agent may be incorporated later for semantic response comparison

Next Steps

Review and refine this plan
Create TODO.md with implementation tasks
Update parser to support new simplified format
Implement whitespace-normalized comparison logic
Create sample test files to validate the new format
Add migration documentation for existing test cases

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test Case Format Improvement Plan

Overview

Current Format vs. Improved Format

Current Format (Complex)

Improved Format (Simplified)

Why This is an Improvement

Test File Format

Structure

YAML Syntax

Assertion Logic

String Comparison

Regex Support

Error Output

Execution Flow

File Organization

CLI Tool Integration

opencode Command

Example

prompt-tester Usage (Enhanced)

Output Format

Console Reporter

Migration Notes

What Changes

What Stays the Same

Backward Compatibility

Assumptions

Next Steps

FilesExpand file tree

PLAN.md

Latest commit

History

PLAN.md

File metadata and controls

Test Case Format Improvement Plan

Overview

Current Format vs. Improved Format

Current Format (Complex)

Improved Format (Simplified)

Why This is an Improvement

Test File Format

Structure

YAML Syntax

Assertion Logic

String Comparison

Regex Support

Error Output

Execution Flow

File Organization

CLI Tool Integration

opencode Command

Example

prompt-tester Usage (Enhanced)

Output Format

Console Reporter

Migration Notes

What Changes

What Stays the Same

Backward Compatibility

Assumptions

Next Steps