OpenCode Auto Experiment is an experiment-optimization loop for OpenCode + oh-my-opencode.
It is designed for research and engineering workflows where you want an outer orchestrator to:
- run a baseline experiment,
- consult three read-only specialists,
- choose one primary hypothesis,
- let a single executor change code,
- rerun experiments,
- keep or discard the result,
- and stop automatically when the target metric is reached.
The current implementation uses:
Sisyphus (Ultraworker)as the only outer orchestrator,sisyphus-junioras the only code executor,Prometheus (Plan Builder)only for bootstrap orreview_blockedreplanning,Apollo,Athena, andHermesas the three read-only specialists,- a Python controller for baseline, DVC, DVCLive, resume, stop, and status.
For the current release, all three specialists are configured to use kimi-for-coding/kimi-k2.5. The role names are now decoupled from the underlying model, so you can later remap them to GPT / Claude / Gemini without rewriting the orchestration layer.
This repository provides a governed experiment loop with these core pieces:
- OpenCode plugin registration for commands, tools, and agents
- role-based multi-agent orchestration
- a Python experiment controller with structured state files
- DVC + DVCLive integration for experiment bookkeeping
- Docker-based smoke tests for scientific workflows and OpenCode/Kimi wiring
Authority-path note:
scripts/innovation_loop.pyis the only outer-loop authority path.- TypeScript legacy orchestration remains only as internal/test-only compatibility code.
experiment_run_governed_workflownow bridges into the Python controller instead of running a competing TS outer loop.
The control flow is:
Sisyphusreads the goal and current state.Sisyphusinvokes the three read-only specialists:Apollo: exploit-oriented proposal generationAthena: validity and attribution guardHermes: orthogonal divergence proposal
Sisyphusselects one primary hypothesis.sisyphus-junioris the only agent allowed to execute code changes.- The Python controller runs baseline / smoke / proxy / full stages and records results.
- DVC / DVCLive / truth-source files are updated.
- The loop continues until:
- target threshold is reached,
- budget is exhausted,
- or the system becomes
review_blocked.
Important files:
AGENTS.mdopencode.jsonsrc/commands/index.tssrc/agents/src/tools/index.tsscripts/innovation_loop.pyconfigs/goal.yamlexperiments/session.jsonexperiments/best.jsonexperiments/attempts.jsonlexperiments/proposals/
opencode-auto-experiment/
├── .opencode/ # OpenCode command and agent docs
├── configs/ # Goal and experiment config
├── dist/ # Built TypeScript output
├── experiments/ # Truth-source artifacts and skeleton files
├── fixtures/ # Test fixtures
├── scripts/ # Python controller and Docker smoke scripts
├── src/ # TypeScript plugin, agents, tools, orchestration
├── tests/ # Unit, integration, and E2E tests
├── AGENTS.md # Routing rules and project-level guidance
├── docker-compose.yml
├── Dockerfile
├── Dockerfile.opencode
├── dvc.yaml
├── evaluate.py # Local toy evaluation task
├── opencode.json # Agent model mapping
├── package.json
├── params.yaml
└── README.md
- Node.js 22+
- npm 10+
- Python 3.10+
- Docker + Docker Compose
- DVC 3.x
- DVCLive 3.x
- OpenCode CLI
- oh-my-opencode plugin
- a remote training server for real long-running experiments
- GPU / ROCm / CUDA depending on your research environment
git clone <your-repo-url>
cd opencode-auto-experimentnpm ciIf you want local controller execution with real DVC support:
python3 -m pip install "dvc>=3,<4" "dvclive>=3,<4"npm install -g opencode-ai oh-my-opencodeOr use the local bootstrap script:
bash scripts/install-local.shTo make the experiment system available from any plain opencode session, also install the global command set:
bash scripts/install-opencode-global.shCopy the example file:
cp .env.example .envThen set the required values for your model provider. For the current default Kimi setup, you need:
KIMI_CODING_API_KEYKIMI_CODING_BASE_URL
For a fuller setup and upgrade workflow, including global slash-command installation, see INSTALL.md.
Agent model mapping lives in opencode.json.
Current defaults:
Sisyphus (Ultraworker)->kimi-for-coding/kimi-k2.5Prometheus (Plan Builder)->kimi-for-coding/kimi-k2.5sisyphus-junior->kimi-for-coding/kimi-k2.5Apollo->kimi-for-coding/kimi-k2.5Athena->kimi-for-coding/kimi-k2.5Hermes->kimi-for-coding/kimi-k2.5
You can later remap the three specialists independently without changing the architecture.
Experiment goals are defined in configs/goal.yaml.
Important fields include:
- target metric
- metric direction
- target threshold
- round budget
- full-run budget
- innovation constraints
- command definitions for baseline / smoke / proxy / full
The controller persists state under experiments/:
session.jsonbest.jsonattempts.jsonlproposals/round-xxxx.json
npm run buildnpm testnpm run docker:test:scientificnpm run docker:test:opencodepython3 scripts/innovation_loop.py bootstrap --config configs/goal.yaml --workspace . --mode mock
python3 scripts/innovation_loop.py tick --config configs/goal.yaml --workspace . --mode mock
python3 scripts/innovation_loop.py status --config configs/goal.yaml --workspace . --mode mockThe main OpenCode-facing commands are:
/innovate-loop/experiment-init/experiment-run/experiment-status/experiment-bootstrap
The intended automatic path is /innovate-loop driven by Sisyphus.
Two main services exist in docker-compose.yml:
app: general test environmentopencode-kimi: OpenCode + oh-my-opencode + Kimi smoke environment
Useful commands:
docker compose run --rm app npm test
docker compose run --rm opencode-kimi bash /workspace/scripts/docker/run-opencode-kimi-smoke.shThe current implementation has already passed:
- local
npm test - local
npm run build - Docker
appfull tests - Docker scientific smoke
- Docker OpenCode + Kimi smoke
The current toy automatic experiment loop has also been verified to:
- run a baseline
- invoke
Apollo,Athena, andHermes - route code execution through
sisyphus-junior - keep successful changes
- stop automatically on
goal_reached
This repository is already usable as a governed experiment-loop prototype, but a few limitations should be documented clearly:
- The current public default still uses one underlying model (
kimi-k2.5) for all specialists. - Real multi-hour training on a remote server is not the main validated path yet.
- Docker scientific smoke uses toy research tasks for fast validation.
- The full production path for "local OpenCode -> remote server training" should be validated separately in your final environment.
- You must never commit real secrets or provider keys into the repository.
- Do not commit
.envor any real API keys. - Review
opencode.jsonand environment variables before publishing. - If you publish this repository publicly, add a license before release.
- Document whether your real training environment depends on CUDA, ROCm, or remote SSH access.
Remote execution is supported as a contract pattern rather than a separate controller.
- The outer loop still runs locally through
scripts/innovation_loop.py goal.yamlstage commands may wrap remote execution- Metrics, checkpoints, and resume semantics must remain stable
See REMOTE_EXECUTION.md for the exact contract.
Validation paths:
npm test -- tests/e2e/python-controller-real-dvc.test.tsnpm test -- tests/e2e/remote-contract.test.ts
The previous monolithic daily-research-brain job has been split into two clearer chains:
daily-research-maintenance- vault organization
- file standardization
- figure-note filling
- index refresh
daily-research-context- retrieval generation
- evidence pack generation
- controller-state-aware inference only
This split makes maintenance failures easier to diagnose and keeps inference generation separate from vault cleanup.
- Add a
LICENSEfile before public release. - Decide whether to keep Kimi as the default public model or provide a provider-agnostic example config.
- Add a
CONTRIBUTING.mdif you want outside contributions. - Add a dedicated remote-server guide once your final local-to-remote path is stable.