This guide explains how to use AgentForge in other repositories and with different LLM providers.
- Add AgentForge dependency:
pnpm add @elata-biosciences/agentforge- Create a simulation directory:
npx forge-sim init sim- Implement a project-specific
Packthat:
- initializes contracts/state,
- exposes world state via
getWorldState(), - executes actions via
executeAction(), - returns protocol metrics via
getMetrics().
- Author scenarios and run:
forge-sim run sim/scenarios/stress.ts --mode deterministic
forge-sim run sim/scenarios/stress.ts --mode exploration
forge-sim run sim/scenarios/stress.ts --mode replay --replay-bundle sim/results/<run>/replay_bundle.json- Keep world state deterministic for non-exploration runs.
- Ensure
executeAction()returns stable success/error semantics. - Expose metrics that matter for exploit and mechanism evaluation.
- Optionally implement
callRpc()andgetDeployedContracts()for exploration allowlists. - Keep action names stable across versions to preserve replay compatibility.
AgentForge uses a provider-neutral LLM client interface. You can run:
- OpenAI models,
- Anthropic models,
- local and hosted OpenAI-compatible endpoints (for example Ollama/vLLM gateways, DeepSeek, Kimi),
- OpenRouter routing (many models behind one API key),
- Gemini (Google).
Use provider openai-compatible with a custom baseUrl and apiKey.
Environment example:
export OPENAI_COMPAT_BASE_URL="https://api.deepseek.com/v1"
export OPENAI_COMPAT_API_KEY="..."
export OPENAI_COMPAT_MODEL="deepseek-chat"Or for Kimi (Moonshot) if using an OpenAI-style gateway:
export OPENAI_COMPAT_BASE_URL="https://api.moonshot.cn/v1"
export OPENAI_COMPAT_API_KEY="..."
export OPENAI_COMPAT_MODEL="moonshot-v1-32k"Notes:
- vendors differ in model naming; keep
OPENAI_COMPAT_MODELexplicit. - response shapes can vary; if a vendor returns non-standard JSON, you may need a thin adapter.
Provider openrouter calls OpenRouter's chat completions API with OpenRouter-specific headers.
export OPENROUTER_API_KEY="..."
export OPENROUTER_MODEL="openai/gpt-4o-mini"
export OPENROUTER_BASE_URL="https://openrouter.ai/api/v1"
# Optional (recommended) metadata:
export OPENROUTER_APP_NAME="agentforge-validation"
export OPENROUTER_APP_URL="https://github.com/Elata-Biosciences/agentforge"Provider gemini uses the Google Generative Language API.
export GEMINI_API_KEY="..."
export GEMINI_MODEL="gemini-1.5-flash"Recommended pattern:
- baseline with one provider/model,
- repeat exploration with alternative providers,
- replay all resulting bundles on contract updates.
- Run deterministic baseline to establish KPIs.
- Run exploration campaigns with multiple providers/models.
- Store replay bundles as long-lived campaign artifacts.
- Modify contracts (new version/fix).
- Re-run replay bundles to check exploit persistence or mitigation.
- Compare metrics and exploit outcomes between versions.
- Job A: deterministic regression suite.
- Job B: replay campaigns from prior exploration bundles.
- Job C (optional/manual): live exploration runs with LLM providers.