Add agent testing harness for skill verification by Myestery · Pull Request #2921 · golemcloud/golem

Myestery · 2026-03-04T03:57:46Z

Summary

Implements the full agent testing harness (tests/harness/) for validating AI coding agents can discover and follow Golem skill files
Adds golem-new-project skill and bootstrap scenario (golem-new-project-ts)
Adds CI workflow (skills-test.yaml) with unit tests + integration tests on Ubuntu

Issues

Closes #2872
Closes #2873
Closes #2874
Closes #2875
Closes #2876
Closes #2877
Closes #2878
Closes #2879
Closes #2880
Closes #2881
Closes #2882
Closes #2898
Closes #2906
Closes #2907

What's included

CLI entrypoint (run.ts): arg parsing, scenario filtering, config summary
Scenario loader: YAML parsing with Zod schema validation
AgentDriver interface + Claude Code driver (with skill symlinking, session resumption) + Gemini stub
SkillWatcher: inotifywait (Linux), fswatch (macOS), atime comparison, presence-based fallback
Assertion engine: default, strict, and allowedExtraSkills modes
Build verification: golem build with automatic golem.yaml directory discovery
JSON report generation: per-scenario results with timing, skills, and error details
21 unit tests: watcher path extraction, executor assertions, loader validation
CI workflow: unit tests job + integration test matrix (claude-code, gemini) x (ts)
CI: Install and start Golem server (v1.4.2 binary, /healthcheck readiness loop)
CI: Install coding agents (Claude Code via npm, Gemini stub)

Key fixes applied during development

stdio: ['pipe', ...] → ['ignore', ...] to prevent stdin pipe hang with Claude Code
Removed invalid fswatch --event Access flag (not a supported event type on macOS)
Added workspace cleanup before each run to prevent stale directory conflicts
Added findGolemProjectDir() to handle golem new creating a subdirectory
Fixed test glob pattern (**/*.test.js → *.test.js) for POSIX shell compatibility
Fixed golem binary download (plain binary from golemcloud/golem, not tarball from golemcloud/golem-cli)
Fixed health check endpoint (/healthcheck instead of /version)
Fixed harness exit code to return 1 on scenario failures

Test plan

cd tests/harness && npm run build && npm test — 21/21 unit tests pass
CI workflow unit-tests job passes
CI workflow integration-tests jobs run (fail correctly when API keys not configured)

Implements the skill testing harness (#2872-#2882, #2898) that validates AI coding agents can discover, load, and follow Golem skill files to produce correct build artifacts. - CLI entrypoint with arg parsing and scenario filtering - YAML scenario loader with Zod schema validation - AgentDriver interface with Claude Code and Gemini (stub) drivers - SkillWatcher with inotifywait (Linux), fswatch (macOS), atime and presence-based fallback detection - Skill activation assertion engine (default, strict, allowedExtras) - Build verification with golem.yaml directory discovery - JSON report generation per scenario run - Bootstrap scenario: golem-new-project-ts - golem-new-project skill (SKILL.md) - CI workflow for unit + integration tests on Ubuntu - 21 unit tests (watcher, executor assertions, loader validation)

github-actions · 2026-03-04T03:57:54Z

Thank you for your contribution! Before we can merge this PR, we need you to sign our Contributor License Agreement. Please read the CLA and post the comment below to sign.

I have read the CLA Document and I hereby sign the CLA

_{You can retrigger this bot by commenting recheck in this Pull Request.}_{Posted by the CLA Assistant Lite bot.}

Change **/*.test.js to *.test.js since /bin/sh (dash) on Linux does not support globstar. All test files are in a flat directory so ** is unnecessary.

- Download golem v1.4.2 binary from golemcloud/golem releases - Use golem server run with /healthcheck readiness loop - Fix executor health check to use /healthcheck endpoint - Exit with code 1 when any scenario fails

Fix CI test glob pattern for Linux compatibility

30ab551

Change **/*.test.js to *.test.js since /bin/sh (dash) on Linux does not support globstar. All test files are in a flat directory so ** is unnecessary.

Myestery force-pushed the agent-testing-harness branch 4 times, most recently from 28dda12 to fc5f6f4 Compare March 5, 2026 02:15

Fix CI: golem binary, health check, and exit code on failure

0ccf003

- Download golem v1.4.2 binary from golemcloud/golem releases - Use golem server run with /healthcheck readiness loop - Fix executor health check to use /healthcheck endpoint - Exit with code 1 when any scenario fails

Myestery force-pushed the agent-testing-harness branch from fc5f6f4 to 0ccf003 Compare March 5, 2026 02:23

Myestery marked this pull request as ready for review March 5, 2026 02:49

Myestery merged commit 3290b88 into skills Mar 5, 2026
26 of 32 checks passed

Myestery deleted the agent-testing-harness branch March 5, 2026 02:49

github-actions bot locked and limited conversation to collaborators Mar 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add agent testing harness for skill verification#2921

Add agent testing harness for skill verification#2921
Myestery merged 3 commits intoskillsfrom
agent-testing-harness

Myestery commented Mar 4, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Myestery commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Issues

What's included

Key fixes applied during development

Test plan

Uh oh!

github-actions bot commented Mar 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Myestery commented Mar 4, 2026 •

edited

Loading