A sequential multi-agent research loop exposed as a Model Context Protocol (MCP) server over stdio. Each research run creates a versioned, inspectable filesystem tree under runs/<uuid>/ following the Interpretable Context Methodology (ICM, Van Clief & McDermott 2026, arXiv:2603.16021).
What it does: takes a question, runs researcher_a (first-pass) → researcher_b (second-pass, with first-pass findings as input) → consolidator (synthesis), with automatic drift detection via sem_debug at every stage transition and a user-driven resume protocol when drift is found.
API-agnostic: works with any OpenAI-compatible endpoint. Default provider is ollama-cloud; override via environment variable.
The filesystem is the pipeline. Every run lives in runs/<uuid>/ with the following 5-layer ICM structure:
runs/<uuid>/
├── CONTEXT.md ← run-level question and routing decisions
├── state.json ← current run state
├── decisions.md ← append-only audit trail with YAML frontmatter
├── 01_research_a/
│ ├── CONTEXT.md ← stage contract (from prompts/researcher_a.md)
│ ├── findings.md ← model output
│ └── decisions.md ← stage-level decision trail
├── 02_research_b/
│ ├── CONTEXT.md
│ ├── findings.md
│ └── decisions.md
└── 03_consolidate/
├── CONTEXT.md
├── findings.md ← final synthesized report
└── decisions.md
Agents are roles, not separate processes. The MCP server selects the stage contract (prompts/<role>.md) for each invocation.
| Role | Purpose | Stage |
|---|---|---|
orchestrator |
Routes commands, manages run lifecycle, makes resume decisions | meta |
researcher_a |
First-pass research; produces initial findings | 01_research_a |
researcher_b |
Second-pass research; reads researcher_a's findings as input | 02_research_b |
consolidator |
Synthesizes both passes into a final report | 03_consolidate |
State machine:
INITIALIZED → RESEARCHING_A → RESEARCHING_B → CONSOLIDATING → DONE
If sem_debug detects drift at any transition, the run pauses at AWAIT_USER. The user resumes via resume_run with one of four actions: accept, rerun, rerun_strict, or abort.
Stage contracts are defined in prompts/<role>.md using YAML frontmatter:
---
stage: 01_research_a
stage_role: first-pass research
model: kimi-k2.6:cloud
agent_id: researcher_a
inputs:
- field: question
source: ../CONTEXT.md
outputs:
- field: findings
path: findings.md
contract: "Markdown analysis. Last line must be exactly: PASS_1_COMPLETE"
sem_debug_check:
trigger: after_outputs
check: "Does findings.md address the question directly without unsupported claims?"
on_drift: AWAIT_USER
---- Python 3.11+
- An account with ollama-cloud (or any OpenAI-compatible API endpoint)
sem_debug(optional, for drift detection; install separately from WBChain3/sem_debug)
git clone https://github.com/WBChain3/research_loop.git
cd research_loop
pip install -e .Dependencies are managed in pyproject.toml:
pyyaml,httpx,python-dotenv,pydantic,mcp>=1.0
cp .env.example .env
# Edit .env and set OLLAMA_API_KEY=your_key_here| Variable | Required | Description |
|---|---|---|
OLLAMA_API_KEY |
Yes | API key for ollama-cloud |
OLLAMA_BASE_URL |
No | Defaults to https://ollama.com/v1; override for local endpoints |
OPENAI_API_KEY |
No | Alternative provider |
ANTHROPIC_API_KEY |
No | Alternative provider |
Start the MCP server:
python -m agent_loop.cliThe server runs over stdio, listening for JSON-RPC messages per the MCP protocol. Connect with any MCP client (Claude Desktop, mcp-client, etc.).
Full lifecycle example:
- Create a run via
start_researchwith a question. - Advance via
advance_run(call 3× for the full pipeline). - Check status via
get_run_statusat any time. - If drift is detected, the run enters
AWAIT_USER. Resume viaresume_runwithaccept,rerun,rerun_strict, orabort.
| Tool | Purpose |
|---|---|
start_research |
Create a new research run. Returns run_id, state, context_path. |
advance_run |
Execute the next stage. Call once per stage. Returns the new state, completed stage, and findings path. |
get_run_status |
Query the current state of a run. |
list_runs |
List all runs, newest first. |
resume_run |
Resume a paused run (AWAIT_USER). Actions: accept, rerun, rerun_strict, abort. |
abort_run |
Terminate a run idempotently. |
Each tool returns a dict-shaped response. On error, the response contains {"error": "...", "state": "..."} — no raw exceptions leak to the client.
pytest tests/116 tests, no live model calls, ~26s run time. The suite covers model client, workspace parser, state manager, run manager, trace, prompts, and MCP server.
Every decisions.md has YAML frontmatter (timestamp, actor, stage, event, related_artifacts, summary) and a free-form body. The trail is the audit log; the body is for humans. Every state transition and every resume action is recorded.
MIT