Skip to content

Commit b3fe4c5

Browse files
committed
fix(tests): claude plugin tests updated
1 parent 3992958 commit b3fe4c5

File tree

9 files changed

+861
-471
lines changed

9 files changed

+861
-471
lines changed

CLAUDE.md

Lines changed: 31 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -305,6 +305,34 @@ jq empty .claude-plugin/marketplace.json
305305
# Both should exit silently (no output = valid)
306306
```
307307

308+
### Behavioral Evaluation
309+
310+
Beyond structural validation, test plugin behavior with CC as judge:
311+
312+
```bash
313+
./tests/eval-plugin.sh plugins/my-plugin # Run behavioral tests
314+
./tests/eval-plugin.sh --verbose plugins/my-plugin # Show detailed output
315+
./tests/eval-plugin.sh --dry-run plugins/my-plugin # Preview tests without running
316+
```
317+
318+
**Test File Location:** `tests/plugin-name.txt` (NOT inside plugin directory)
319+
320+
**Isolation:** Tests are stored outside the plugin directory so they're not auto-loaded with plugin content. The test agent runs in a separate session from the judge agent. Note: Full filesystem isolation isn't possible with CC—this is behavioral testing, not adversarial testing.
321+
322+
**Format:**
323+
```
324+
# Comment lines start with #
325+
prompt text here|expected behavior description
326+
another prompt|what the response should demonstrate
327+
```
328+
329+
Each test runs the prompt against the plugin, then uses a separate CC instance as judge to evaluate if the response matches expected behavior.
330+
331+
**Model Strategy:**
332+
- Test agent: Haiku first (fast/cheap) → Sonnet fallback on failure
333+
- Judge agent: Always Sonnet (reliable judgment)
334+
- Bug reports generated at `tests/reports/` for failed tests
335+
308336
### Pre-Publish Checklist
309337

310338
Before committing a new plugin:
@@ -317,6 +345,7 @@ Before committing a new plugin:
317345
- [ ] Prerequisites/dependencies documented
318346
- [ ] Author attribution included
319347
- [ ] Git commit message is clear and specific
348+
- [ ] Behavioral tests pass (`./tests/eval-plugin.sh`)
320349

321350
## Common Workflows
322351

@@ -532,7 +561,7 @@ This enables pre-commit checks: plugin.json validation, frontmatter linting, git
532561
### Pre-Release Validation
533562

534563
```bash
535-
./scripts/validate-plugin.sh plugins/my-plugin/ # Full plugin validation
564+
./tests/validate-plugin.sh plugins/my-plugin/ # Full plugin validation
536565
jq empty plugins/my-plugin/.claude-plugin/plugin.json # Validate JSON
537566
```
538567

@@ -561,6 +590,7 @@ All plugins in this marketplace should have:
561590
| **Add external plugin** | `git submodule add URL external_plugins/name` |
562591
| **Update externals** | `git submodule update --remote` |
563592
| **Remove submodule** | `git submodule deinit -f external_plugins/name` |
593+
| **Run behavioral tests** | `./tests/eval-plugin.sh plugins/name` |
564594

565595
## Resources
566596

PLUGIN_DEVELOPMENT_GUIDE.md

Lines changed: 0 additions & 299 deletions
This file was deleted.

0 commit comments

Comments
 (0)