This document explains the fundamental concepts in AgentForge.
A Scenario is the complete specification of a simulation run. It defines:
- name: Identifier for the scenario
- seed: Random seed for deterministic replay
- ticks: Number of discrete time steps to simulate
- tickSeconds: Simulated seconds per tick
- pack: Protocol adapter providing world state and action execution
- agents: List of agent types and counts
- metrics: Configuration for metric sampling
- assertions: Validations to run at end of simulation
- probes: Custom metric probes (optional)
- checkpoints: Checkpoint configuration (optional)
- gossip: Communication channels and attention budgets (optional)
- query: Budgeted world-query endpoints (optional)
- schedule: Info and assumption events injected at configured ticks (optional)
- smoke: Assumption perturbation checkpoints and divergence outputs (optional)
import { defineScenario } from '@elata-biosciences/agentforge';
export default defineScenario({
name: 'market-stress',
seed: 42,
ticks: 100,
tickSeconds: 3600,
pack: new MyPack(),
agents: [
{ type: TraderAgent, count: 10 },
{ type: ArbitrageAgent, count: 2 },
],
assertions: [
{ type: 'gt', metric: 'totalVolume', value: 0 },
],
});A Tick is one discrete time step in the simulation. During each tick:
- The pack is notified via
onTick(tick, timestamp) - Scheduled events are applied (
gossip_inject, world overlays, etc.) - Gossip deliveries for the tick are processed
- Agents are scheduled in a deterministic order
- Each agent observes state and decides an action
- Actions are validated/executed (
QueryWorld,PostMessage,RpcCall, pack actions) - Metrics are sampled (at configured intervals)
- Optional artifacts are captured (memory snapshots, smoke divergence, checkpoints)
Tick timing is simulated, not real-time. The tickSeconds parameter controls how much simulated time passes between ticks.
An Agent is an autonomous actor that participates in the simulation. Agents:
- Observe the world state each tick
- Make decisions based on their strategy
- Emit actions to be executed
- Maintain persistent memory across ticks
- Can use cooldowns to rate-limit actions
import { BaseAgent, type Action, type TickContext } from '@elata-biosciences/agentforge';
export class MyAgent extends BaseAgent {
async step(ctx: TickContext): Promise<Action | Action[] | null> {
const price = ctx.world.price as number;
if (ctx.rng.chance(0.3)) {
return {
id: this.generateActionId('buy', ctx.tick),
name: 'buy',
params: { amount: ctx.rng.nextInt(1, 100) },
};
}
return null; // Skip this tick
}
}Agents can return a single action, an array of actions, or null:
async step(ctx: TickContext): Promise<Action | Action[] | null> {
const trade = { id: this.generateActionId('swap', ctx.tick), name: 'u4_swap', params: { ... } };
const message = { id: this.generateActionId('post', ctx.tick), name: 'PostMessage', params: { channelId: 'strategy', text: '...' } };
return [trade, message]; // Both executed sequentially within this tick
}The engine executes each action in array order. Each is validated and recorded independently.
- Memory:
this.remember(key, value)/this.recall(key)- persist state across ticks - Cooldowns:
this.setCooldown(action, ticks, currentTick)- rate-limit actions - Parameters:
this.getParam(key, default)- access scenario-defined params
An Action is a command emitted by an agent to be executed by the pack:
interface Action {
id: string; // Unique identifier (use generateActionId)
name: string; // Action type (e.g., 'buy', 'sell', 'stake')
params: Record<string, unknown>; // Action-specific parameters
metadata?: Record<string, unknown>;
}Actions are executed through pack.executeAction() and return an ActionResult:
interface ActionResult {
ok: boolean; // Success or failure
error?: string; // Error message if failed
events?: ActionEvent[]; // Emitted events
balanceDeltas?: Record<string, bigint>;
gasUsed?: bigint;
txHash?: string;
}A Pack is a protocol adapter that bridges AgentForge to your specific smart contracts or simulation environment. It provides:
- World State: Current protocol state observable by agents
- Action Execution: Execute agent actions against the protocol
- Metrics: Protocol-specific metrics for analysis
interface Pack {
name: string;
initialize(): Promise<void>;
onTick?(tick: number, timestamp: number): void;
getWorldState(): WorldState;
executeAction(action: Action, agentId: string): Promise<ActionResult>;
getMetrics(): Record<string, number | bigint | string>;
cleanup(): Promise<void>;
}The Ordering Policy determines the sequence in which agents act each tick:
- random (default): Fisher-Yates shuffle using seeded RNG
- round-robin: Rotate starting position each tick
- priority: Order by priority function (higher first)
Ordering is deterministic given the same seed, ensuring reproducible results.
AgentForge guarantees deterministic replay: same seed + same scenario + same code = identical results.
This is achieved through:
- Seeded RNG: All randomness derives from the scenario seed
- Tick-derived RNG: Each tick gets a derived RNG
- Agent-derived RNG: Each agent gets a per-tick derived RNG
- Deterministic Action IDs: IDs use counters, not timestamps
- Deterministic Ordering: Agent order is reproducible
This introduces a two-step workflow:
- Exploration run records actions, message flows, and query traces into
replay_bundle.json. - Replay run reuses that bundle and executes without live LLM calls.
This lets you discover exploit paths with exploratory policies and then regression-test those paths after contract changes.
You can verify determinism by comparing artifact hashes:
# Run twice with same seed
forge-sim run --toy --seed 123 --out run1 --ci
forge-sim run --toy --seed 123 --out run2 --ci
# Compare runs
forge-sim compare run1/toy-market-ci run2/toy-market-ci
# Should report "Artifact hashes are identical"Each simulation run produces durable artifacts in the output directory:
results/<scenario>-<timestamp>/
├── summary.json # Run metadata, final metrics, assertion results
├── metrics.csv # Time-series metrics data
├── actions.ndjson # All agent actions (newline-delimited JSON)
├── gossip.ndjson # Gossip posts/deliveries (if gossip enabled)
├── agent_memory.ndjson # Agent memory snapshots (if capture enabled)
├── replay_bundle.json # Exploration traces for replay mode
├── smoke_results.json # Assumption-perturbation divergence results (if configured)
├── config_resolved.json # Resolved scenario configuration
├── report.md # Generated report (if requested)
└── checkpoints/ # Checkpoint snapshots (if configured)
├── tick_00050.json
└── tick_00100.json
Contains run metadata, final KPIs, agent statistics, and assertion results.
Time-series data with one row per sampled tick. Columns include tick number, timestamp, and all pack metrics.
Every action taken by every agent, with results. One JSON object per line.
The fully resolved scenario configuration, useful for reproducing runs.
Probes are custom metric samplers that extend beyond pack-provided metrics:
defineScenario({
// ...
probes: [
{
name: 'totalSupply',
type: 'call',
config: { target: 'token', method: 'totalSupply' },
},
{
name: 'tvlRatio',
type: 'computed',
config: {
compute: (pack, probes) => {
const tvl = pack.getMetrics().tvl as number;
const supply = probes.totalSupply as number;
return tvl / supply;
},
},
},
],
probeEveryTicks: 5,
});Checkpoints capture simulation state at intervals for debugging:
defineScenario({
// ...
checkpoints: {
everyTicks: 50,
includeAgentMemory: true,
includeProbes: true,
},
});Checkpoints help identify when and where behavior changes during a simulation.