agentloop — ICM-aligned MCP server for multi-agent research orchestration

A sequential multi-agent research loop exposed as a Model Context Protocol (MCP) server over stdio. Each research run creates a versioned, inspectable filesystem tree under runs/<uuid>/ following the Interpretable Context Methodology (ICM, Van Clief & McDermott 2026, arXiv:2603.16021).

What it does: takes a question, runs researcher_a (first-pass) → researcher_b (second-pass, with first-pass findings as input) → consolidator (synthesis), with automatic drift detection via sem_debug at every stage transition and a user-driven resume protocol when drift is found.

API-agnostic: works with any OpenAI-compatible endpoint. Default provider is ollama-cloud; override via environment variable.

Architecture

The filesystem is the pipeline. Every run lives in runs/<uuid>/ with the following 5-layer ICM structure:

runs/<uuid>/
├── CONTEXT.md            ← run-level question and routing decisions
├── state.json            ← current run state
├── decisions.md          ← append-only audit trail with YAML frontmatter
├── 01_research_a/
│   ├── CONTEXT.md        ← stage contract (from prompts/researcher_a.md)
│   ├── findings.md       ← model output
│   └── decisions.md      ← stage-level decision trail
├── 02_research_b/
│   ├── CONTEXT.md
│   ├── findings.md
│   └── decisions.md
└── 03_consolidate/
    ├── CONTEXT.md
    ├── findings.md       ← final synthesized report
    └── decisions.md

Agents are roles, not separate processes. The MCP server selects the stage contract (prompts/<role>.md) for each invocation.

Role	Purpose	Stage
`orchestrator`	Routes commands, manages run lifecycle, makes resume decisions	meta
`researcher_a`	First-pass research; produces initial findings	`01_research_a`
`researcher_b`	Second-pass research; reads researcher_a's findings as input	`02_research_b`
`consolidator`	Synthesizes both passes into a final report	`03_consolidate`

State machine:

INITIALIZED → RESEARCHING_A → RESEARCHING_B → CONSOLIDATING → DONE

If sem_debug detects drift at any transition, the run pauses at AWAIT_USER. The user resumes via resume_run with one of four actions: accept, rerun, rerun_strict, or abort.

Stage contracts are defined in prompts/<role>.md using YAML frontmatter:

---
stage: 01_research_a
stage_role: first-pass research
model: kimi-k2.6:cloud
agent_id: researcher_a
inputs:
  - field: question
    source: ../CONTEXT.md
outputs:
  - field: findings
    path: findings.md
    contract: "Markdown analysis. Last line must be exactly: PASS_1_COMPLETE"
sem_debug_check:
  trigger: after_outputs
  check: "Does findings.md address the question directly without unsupported claims?"
  on_drift: AWAIT_USER
---

Prerequisites

Python 3.11+
An account with ollama-cloud (or any OpenAI-compatible API endpoint)
sem_debug (optional, for drift detection; install separately from WBChain3/sem_debug)

Install

git clone https://github.com/WBChain3/research_loop.git
cd research_loop
pip install -e .

Dependencies are managed in pyproject.toml:

pyyaml, httpx, python-dotenv, pydantic, mcp>=1.0

Config

cp .env.example .env
# Edit .env and set OLLAMA_API_KEY=your_key_here

Variable	Required	Description
`OLLAMA_API_KEY`	Yes	API key for ollama-cloud
`OLLAMA_BASE_URL`	No	Defaults to `https://ollama.com/v1`; override for local endpoints
`OPENAI_API_KEY`	No	Alternative provider
`ANTHROPIC_API_KEY`	No	Alternative provider

Usage

Start the MCP server:

python -m agent_loop.cli

The server runs over stdio, listening for JSON-RPC messages per the MCP protocol. Connect with any MCP client (Claude Desktop, mcp-client, etc.).

Full lifecycle example:

Create a run via start_research with a question.
Advance via advance_run (call 3× for the full pipeline).
Check status via get_run_status at any time.
If drift is detected, the run enters AWAIT_USER. Resume via resume_run with accept, rerun, rerun_strict, or abort.

MCP Tools (6)

Tool	Purpose
`start_research`	Create a new research run. Returns `run_id`, `state`, `context_path`.
`advance_run`	Execute the next stage. Call once per stage. Returns the new state, completed stage, and findings path.
`get_run_status`	Query the current state of a run.
`list_runs`	List all runs, newest first.
`resume_run`	Resume a paused run (`AWAIT_USER`). Actions: `accept`, `rerun`, `rerun_strict`, `abort`.
`abort_run`	Terminate a run idempotently.

Each tool returns a dict-shaped response. On error, the response contains {"error": "...", "state": "..."} — no raw exceptions leak to the client.

Test

pytest tests/

116 tests, no live model calls, ~26s run time. The suite covers model client, workspace parser, state manager, run manager, trace, prompts, and MCP server.

Decision trail

Every decisions.md has YAML frontmatter (timestamp, actor, stage, event, related_artifacts, summary) and a free-form body. The trail is the audit log; the body is for humans. Every state transition and every resume action is recorded.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
agent_loop		agent_loop
agent_persona		agent_persona
archive		archive
docs		docs
runs		runs
tests		tests
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

agentloop — ICM-aligned MCP server for multi-agent research orchestration

Architecture

Prerequisites

Install

Config

Usage

MCP Tools (6)

Test

Decision trail

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

agentloop — ICM-aligned MCP server for multi-agent research orchestration

Architecture

Prerequisites

Install

Config

Usage

MCP Tools (6)

Test

Decision trail

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages