-
Notifications
You must be signed in to change notification settings - Fork 8
feat(e2e): tier-1 cross-agent matrix harness #122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
kaghni
wants to merge
11
commits into
main
Choose a base branch
from
feat/e2e-agent-matrix
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
9d0e181
feat(e2e): tier-1 cross-agent matrix harness
kaghni 12b5525
feat(e2e): auto-resolve test workspace from operator's logged-in creds
kaghni a9b3533
fix(e2e): spawn built bundle/cli.js for installers, drop tsx runtime dep
kaghni 69ccab3
docs(e2e): single-command UX + growth/CI-promotion targets
kaghni baec844
feat(e2e): all six agents + checklist-aligned case coverage
kaghni 970b3c3
Merge remote-tracking branch 'origin/main' into feat/e2e-agent-matrix
kaghni 0fcff3a
feat(e2e): case 09 — install side effects must not write broken paths
kaghni 5b0a071
feat(e2e): cases 10-12 — close remaining RELEASE_CHECKLIST gaps
kaghni 295421f
feat(e2e): auto-discover cases — drop file in cases/, runner picks it up
kaghni b69464e
fix(e2e): skipFor points report as skip, not pass
kaghni 1e6840d
feat(e2e): cases 13-18 — full-lifecycle coverage
kaghni File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,87 @@ | ||
| name: E2E (cross-agent matrix) | ||
|
|
||
| # Manual trigger only. This workflow spawns real agent CLIs against real | ||
| # provider APIs and a dedicated Deeplake test workspace — every run costs | ||
| # real money and takes ~10 minutes. We deliberately do NOT run it on | ||
| # every PR; the source + bundle byte-checks in `npm test` keep gating | ||
| # merges. Use this workflow as a release-readiness gate by triggering it | ||
| # manually from the Actions tab against your feature branch. | ||
|
|
||
| on: | ||
| workflow_dispatch: | ||
| inputs: | ||
| case_filter: | ||
| description: "Only run this case id (e.g. 01-capture-smoke). Leave blank for all." | ||
| required: false | ||
| type: string | ||
| agent_filter: | ||
| description: "Only run this agent id (e.g. claude-code). Leave blank for all." | ||
| required: false | ||
| type: string | ||
|
|
||
| permissions: | ||
| contents: read | ||
|
|
||
| jobs: | ||
| e2e: | ||
| name: Tier-1 cross-agent matrix | ||
| runs-on: ubuntu-latest | ||
| timeout-minutes: 30 | ||
| # Gate the job on creds being present. Forks without the e2e secret | ||
| # see a clean skip in the Actions UI rather than a misleading red. | ||
| if: ${{ github.event.repository.full_name == 'activeloopai/hivemind' }} | ||
| steps: | ||
| - uses: actions/checkout@v4 | ||
|
|
||
| - name: Setup Node.js | ||
| uses: actions/setup-node@v4 | ||
| with: | ||
| node-version: 22 | ||
|
|
||
| - name: Install dependencies | ||
| run: npm install | ||
|
|
||
| - name: Build bundles | ||
| # The harness drives the actual bundles for codex/cursor/hermes/pi | ||
| # (claude-code uses --plugin-dir against the source tree). Without | ||
| # build, `hivemind <agent> install` would copy stale or missing | ||
| # bundle files into the tmp HOME. | ||
| run: npm run build | ||
|
|
||
| - name: Install agent CLIs | ||
| # Each tier-1 agent CLI must be on PATH for its driver to spawn. | ||
| # We install the npm-distributed CLIs here; cursor-agent and | ||
| # hermes are typically installed via the agent vendor's own | ||
| # installer outside the npm ecosystem. If those binaries are | ||
| # not on a CI runner, their driver will fail with a clear | ||
| # "spawn error" and the matrix continues. | ||
| run: | | ||
| npm install -g @anthropic-ai/claude-code @openai/codex | ||
| # Pi ships via npm too. | ||
| npm install -g @piapp/cli || true | ||
| # cursor-agent and hermes — install via curl when available; | ||
| # if not, their points fail loudly rather than silently skip. | ||
| curl -fsSL https://cursor.com/install-cli.sh | bash -s -- --print 2>/dev/null || echo "cursor-agent install skipped" | ||
| # Hermes install would go here; install method varies by vendor. | ||
| which claude codex pi cursor-agent hermes 2>&1 || true | ||
|
|
||
| - name: Run e2e matrix | ||
| env: | ||
| HIVEMIND_E2E_CREDS_JSON: ${{ secrets.HIVEMIND_E2E_CREDS_JSON }} | ||
| ANTHROPIC_API_KEY: ${{ secrets.HIVEMIND_E2E_ANTHROPIC_API_KEY }} | ||
| OPENAI_API_KEY: ${{ secrets.HIVEMIND_E2E_OPENAI_API_KEY }} | ||
| GOOGLE_API_KEY: ${{ secrets.HIVEMIND_E2E_GOOGLE_API_KEY }} | ||
| run: | | ||
| args=() | ||
| if [ -n "${{ inputs.case_filter }}" ]; then args+=(--case "${{ inputs.case_filter }}"); fi | ||
| if [ -n "${{ inputs.agent_filter }}" ]; then args+=(--agent "${{ inputs.agent_filter }}"); fi | ||
| npm run e2e -- "${args[@]}" | ||
|
|
||
| - name: Upload summary artifact | ||
| if: always() | ||
| uses: actions/upload-artifact@v4 | ||
| with: | ||
| name: e2e-summary | ||
| path: tests/e2e/results/ | ||
| if-no-files-found: warn | ||
| retention-days: 30 | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
Repository: activeloopai/hivemind
Length of output: 93
🏁 Script executed:
Repository: activeloopai/hivemind
Length of output: 4198
Pin and verify the agent installers.
This step pulls unpinned CLI versions, making runs non-reproducible across days or re-runs. More significantly, the curl-piped installer at line 64 executes a mutable remote script from cursor.com without checksum verification—a supply-chain risk. Pin CLI versions and replace the curl installer with a verified binary or checksum-validated script.
🤖 Prompt for AI Agents