This plan defines an improved test case format for the existing prompt-tester framework. This is not a new project, but an evolution that simplifies the test case schema while leveraging the existing infrastructure (parsers, runners, reporters).
Key Goal: Simplify test case creation for opencode agent testing while maintaining compatibility with the existing framework architecture.
test_suite:
name: "Suite Name"
test_cases:
- id: "unique_id"
name: "Human-readable description"
input:
prompt: "The actual LLM prompt template"
parameters:
param_name: "param_value"
assertions:
- type: "substring"
value: "expected text"
- type: "regex"
value: "pattern"
- type: "metric"
metric_name: "execution_time"
operator: "<"
threshold: 5.0
expected_metrics:
execution_time:
max: 5.0agent: test # Per-file agent specification
test_cases:
- description: "Basic greeting test"
prompt: "hello"
expected: "hello world!"
- description: "Regex test"
prompt: "tell me about cats"
expected: "/cat|feline/"- Simpler to write: No nested structures, just
description,prompt,expected - Per-file agent: Single
agentdeclaration at file level (not repeated per test case) - Straightforward assertions: Only exact match or regex (no complex assertion types needed for opencode)
- Whitespace normalization: Built-in handling of whitespace differences
- Faster iteration: Less boilerplate, quicker test case creation
- Backward compatible: Uses existing parser, runner, and reporter infrastructure
Each test file contains:
- agent: Specifies which opencode agent to use (required, per-file)
- test_cases: Array of individual test cases
agent: test
test_cases:
- description: "Basic greeting test"
prompt: "hello"
expected: "hello world!"
- description: "Regex test"
prompt: "tell me about cats"
expected: "/cat|feline/"
- description: "Complex regex"
prompt: "generate JSON"
expected: "/\\{.*\\}/"- Exact match between actual response and expected value (default)
- Whitespace comparison: preserve original whitespace (spaces, tabs, newlines)
- Future enhancement: integrate an agent to evaluate semantic equivalence of responses
expectedfield can be a plain string (exact match) or regex pattern- Regex patterns enclosed in forward slashes:
/pattern/ - If pattern doesn't match, test fails with regex error message
- If opencode CLI returns non-zero exit code → fail (include exit code)
- If response doesn't match expected → fail with actual vs expected
- Include test description in all error messages
1. Load test file: Parse YAML to extract agent and test cases
2. For each test case:
- Execute: opencode run --agent <agent> <prompt>
- Capture stdout response
- Compare response to expected (with whitespace normalization)
- Mark test as pass/fail
3. Report results: Summary of passed/failed tests
project/
├── tests/
│ ├── basic.yaml
│ ├── advanced.yaml
│ └── edge_cases.yaml
├── bin/
│ └── prompt-tester (existing, enhanced)
├── lib/
│ ├── parser/
│ │ ├── load_test_case.sh (existing, enhanced)
│ │ ├── parse_yaml.sh (existing, enhanced)
│ │ └── parse_json.sh (existing, enhanced)
│ ├── runner/
│ │ ├── execute_test_case.sh (new)
│ │ ├── test_aggregator.sh (existing, enhanced)
│ │ └── run_parallel_tests.sh (existing)
│ └── assertions/
│ ├── assertion_substring.sh (existing)
│ ├── assertion_regex.sh (existing)
│ ├── assertion_equality.sh (new/simplified)
│ └── compare.sh (new - whitespace normalization)
└── reporters/
├── console.sh (existing)
├── json.sh (existing)
└── junit.sh (existing)
opencode run --agent <agent_name> <prompt>opencode run --agent test "hello"
# Output: "hello world!"# Run with new simplified format
./bin/prompt-tester --test-file tests/basic.yaml --reporter console
# Run with verbose output
./bin/prompt-tester -f tests/advanced.yaml -r console -v
# Run and save results
./bin/prompt-tester --test-file tests/all.yaml --reporter junit --output results.xmlRunning tests...
✓ Basic greeting test
✗ Another test
Expected: "expected response"
Got: "actual response"
Results: 2 passed, 1 failed (66.7%)
Note: Test description is used for display in reports.
- Test case format (simplified schema)
- Agent specification (per-file instead of per-test-case)
- Assertion logic (simplified to exact/regex with whitespace normalization)
- CLI interface and arguments
- Reporter infrastructure (console, JSON, JUnit)
- Aggregation and parallel execution support
- Overall architecture and folder structure
- The existing complex format will continue to work (deprecated but supported)
- New format is the recommended approach
- Clear documentation distinguishes between the two formats
- YAML parsing: Will use
yqtool (YAML parser) - opencode CLI: Available in PATH, supports
run --agent <name> <prompt> - Agent configuration: Already set up in opencode (e.g., "test" agent)
- Response format: stdout contains the LLM response
- Timeout: 10 minutes default for opencode responses
- Whitespace: Preserved during comparison (no normalization)
- File extension: .yaml extension enforced for test files
- Future evaluation: An LLM agent may be incorporated later for semantic response comparison
- Review and refine this plan
- Create TODO.md with implementation tasks
- Update parser to support new simplified format
- Implement whitespace-normalized comparison logic
- Create sample test files to validate the new format
- Add migration documentation for existing test cases