-
Notifications
You must be signed in to change notification settings - Fork 14
Description
5000-Turn Long Horizon Eval — Distributed Hive Mind
Architecture
All agents share a single DistributedHiveGraph backed by a DHT (consistent hash ring). Facts are sharded across agents with replication factor R=3. Queries route to shard owners, not all agents.
graph TB
subgraph "Single DHT Ring — All N Agents"
direction LR
R["Consistent Hash Ring<br/>64 virtual nodes per agent<br/>Facts hashed to ring positions"]
end
subgraph "Agent Shards (each holds ~F/N facts)"
A0["Agent 0<br/>Shard: ~50 facts"]
A1["Agent 1<br/>Shard: ~48 facts"]
A2["Agent 2<br/>Shard: ~52 facts"]
AN["Agent N<br/>Shard: ~50 facts"]
end
R --> A0
R --> A1
R --> A2
R --> AN
A0 <-.->|"bloom filter gossip"| A1
A1 <-.->|gossip| A2
A2 <-.->|gossip| AN
Query Flow
sequenceDiagram
participant Q as Question
participant DHT as DHT Router
participant S1 as Shard Owner 1
participant S2 as Shard Owner 2
participant C as Consensus
Q->>DHT: "What is Sarah Chen birthday?"
DHT->>DHT: hash key terms → find shard owners
DHT->>S1: search shard (facts found)
DHT->>S2: search shard (facts found)
S1-->>C: "March 15"
S2-->>C: "March 15"
C->>Q: Consensus answer
Learning Flow
sequenceDiagram
participant T as Turn Pool (5000)
participant W as ThreadPool (10 workers)
participant A as Agent (learns 50 turns)
participant LLM as LLM
participant DHT as DHT Ring
T->>W: Round-robin distribute turns
W->>A: learn batch (parallel)
A->>LLM: extract facts
LLM-->>A: structured facts
A->>A: Store in local Kuzu DB (256MB)
A->>DHT: promote_fact → replicate to R=3 agents
Note over W: 10 agents learn simultaneously (9x speedup)
Eval Results
Single Agent Baseline: 94.1%
| Level | Score | Questions |
|---|---|---|
| L1 direct recall | 97.9% | 31 |
| L2 multi-source synthesis | 100% | 5 |
| L3 temporal reasoning | 83.3% | 3 |
| L4 procedural | 100% | 2 |
| L5 contradiction | 87.5% | 2 |
| L6 incremental update | 100% | 2 |
| L7 teaching | 83.3% | 2 |
| L8 confidence | 89.2% | 2 |
| L9 causal | 100% | 2 |
| L10 counterfactual | 70.8% | 2 |
| L11 novel skill | 100% | 1 |
| L12 far transfer | 62.5% | 2 |
Runtime: 21.7h (21.6h learning, 84s Q&A+grading). Model: claude-sonnet-4-5-20250929.
Federated Hive Progression
| Version | Median | Stddev | Agents | Notes |
|---|---|---|---|---|
| v1 (naive, longest-wins) | 40.0% | — | 100 | No routing, query all agents |
| v3 (consensus+routing, broken) | 34.9% | 31.2% | 100 | Empty root hive, random fallback |
| v3 Opus | 3.6% | 15.5% | 100 | + rate limit errors swallowed |
| Single DHT smoke test | 58.8% | 4.3% | 10 | Correct routing, stable |
| Single DHT full (pending) | TBD | TBD | 100 | Running now (~3h remaining) |
Key Bugs Found and Fixed
P0: Empty Root Hive (fixed in PR #18)
Facts stored in per-group hives during learning but queries routed through empty root hive → random agent fallback → 31% stddev. Fix: single DHT ring with all agents.
P0: Kuzu mmap OOM (fixed in PR #11, #2876)
kuzu.Database() defaults to 80% of system RAM + 8TB mmap per DB. 100 agents = crash. Fix: bounded to 256MB per agent.
P1: Sequential Learning (fixed in PR #17)
5000 turns learned one at a time despite 100 agents. Fix: ThreadPoolExecutor with parallel batches → 9x speedup (21.6h → 2.4h).
P1: Swallowed Errors
_synthesize_with_llm() catches all exceptions silently, masking rate limits as "internal error". Opus scored 3.6% because most answers failed with masked 429 errors.
P2: Longest-Answer-Wins (fixed in PR #17)
Querying 100 agents and picking the longest response. Fix: expertise routing via DHT + no-info filtering + Jaccard consensus.
PRs
| Repo | PR | Status | Change |
|---|---|---|---|
| amplihack-memory-lib | #11 | Merged | buffer_pool_size param on CognitiveMemory |
| amplihack-agent-eval | #17 | Merged | DHT, parallel learning, consensus, median-of-3 |
| amplihack-agent-eval | #18 | Open | Single DHT ring fix (routing bug) |
| amplihack | #2876 | Open | DistributedHiveGraph, DHT, bloom, docs |
Release Assets
| Tag | Repo | Contents |
|---|---|---|
| dataset-5000t-seed42-v1.0 | eval | Pre-built single-agent 5000t Kuzu DB |
| federated-100agent-5000t-v1.0 | eval | 100 federated agent Kuzu DBs |
Success Criteria (from #2866)
- Single agent >75% — 94.1%
- 100-agent hive ≥ single agent — smoke test 58.8%, full eval pending
- No OOM with 100 DBs — fixed (12.3s, 4.8GB)
- Parallel learning speedup — 9x (21.6h → 2.4h)
- Gossip convergence >90% — pending
- Variance < 10% stddev — 4.3% (was 31.2%)
Related
- eval: 5000-turn long horizon learning test — single agent + 100-agent hive (20 groups × 5) #2866 — Original eval spec
- PR feat: distributed hive mind — federation, LearningAgent eval, retrieval pipeline #2717 — Distributed hive mind implementation
- Architecture docs