|
| 1 | +# Architecture |
| 2 | + |
| 3 | +## System Overview |
| 4 | + |
| 5 | +``` |
| 6 | + User prompt ┌─────────────┐ |
| 7 | + + options ──────────────▶│ CLI │ |
| 8 | + (attempts, model, │ (cli.ts) │ |
| 9 | + test command) └──────┬──────┘ |
| 10 | + │ |
| 11 | + ┌──────▼──────┐ |
| 12 | + │ Orchestrator│ |
| 13 | + │ (run.ts) │ |
| 14 | + └──────┬──────┘ |
| 15 | + │ |
| 16 | + ┌───────────────┬───────┼───────┬───────────────┐ |
| 17 | + │ │ │ │ │ |
| 18 | + ┌─────▼─────┐ ┌─────▼─────┐ ... ┌─────▼─────┐ |
| 19 | + │ Agent #1 │ │ Agent #2 │ │ Agent #N │ |
| 20 | + │ worktree │ │ worktree │ │ worktree │ |
| 21 | + │ claude -p │ │ claude -p │ │ claude -p │ |
| 22 | + └─────┬─────┘ └─────┬─────┘ └─────┬─────┘ |
| 23 | + │ │ │ |
| 24 | + └───────┬───────┴───────────────────┘ |
| 25 | + │ |
| 26 | + ┌──────▼──────┐ ┌──────────────┐ |
| 27 | + │ Test Runner │────▶│ Pass/Fail │ |
| 28 | + │ (per agent) │ │ per agent │ |
| 29 | + └──────┬──────┘ └──────────────┘ |
| 30 | + │ |
| 31 | + ┌──────▼──────┐ |
| 32 | + │ Convergence │ |
| 33 | + │ Analysis │ |
| 34 | + │ + Scoring │ |
| 35 | + └──────┬──────┘ |
| 36 | + │ |
| 37 | + ┌──────▼──────┐ |
| 38 | + │ Recommended │ |
| 39 | + │ Agent │ |
| 40 | + └─────────────┘ |
| 41 | +``` |
| 42 | + |
| 43 | +## Module Responsibilities |
| 44 | + |
| 45 | +### CLI (`src/cli.ts`) |
| 46 | +Entry point. Parses arguments, validates inputs, dispatches to commands. |
| 47 | + |
| 48 | +### Commands (`src/commands/`) |
| 49 | + |
| 50 | +- **`run.ts`** — Orchestrates the ensemble: creates worktrees → spawns agents → runs tests → analyzes convergence → recommends → saves results |
| 51 | +- **`apply.ts`** — Applies a selected agent's diff to the main working tree. Supports `--preview` mode. |
| 52 | +- **`list.ts`** — Displays results from the most recent run |
| 53 | + |
| 54 | +### Runners (`src/runners/`) |
| 55 | + |
| 56 | +- **`claude-code.ts`** — Spawns `claude -p` in headless mode in a worktree. Captures stdout, stderr, timing. Returns an `AgentResult`. |
| 57 | + |
| 58 | +### Scoring (`src/scoring/`) |
| 59 | + |
| 60 | +- **`convergence.ts`** — Groups agents by similarity of their code changes. Uses diff-content comparison (Jaccard similarity + union-find clustering). |
| 61 | +- **`diff-parser.ts`** — Parses unified diffs into structured form for comparison. |
| 62 | +- **`test-runner.ts`** — Executes test commands in each worktree. Validates commands for safety (rejects shell operators). |
| 63 | + |
| 64 | +### Utils (`src/utils/`) |
| 65 | + |
| 66 | +- **`git.ts`** — Git worktree creation/cleanup, diff extraction, branch management. |
| 67 | +- **`display.ts`** — Terminal output formatting with cross-platform color support (picocolors). |
| 68 | + |
| 69 | +## Convergence Algorithm |
| 70 | + |
| 71 | +### Step 1: Diff Parsing |
| 72 | +Each agent's unified diff is parsed into structured `DiffFile` objects containing added/removed lines per file. |
| 73 | + |
| 74 | +### Step 2: Pairwise Similarity |
| 75 | +For each pair of agents, Jaccard similarity is computed on the set of added lines: |
| 76 | + |
| 77 | +``` |
| 78 | +similarity(A, B) = |added_lines(A) ∩ added_lines(B)| / |added_lines(A) ∪ added_lines(B)| |
| 79 | +``` |
| 80 | + |
| 81 | +Lines are keyed by `file_path:content` for uniqueness. Similarity = 1 means identical changes, 0 means completely different. |
| 82 | + |
| 83 | +### Step 3: Clustering |
| 84 | +Single-linkage clustering with a threshold of 0.3. Two agents are in the same cluster if ANY pair within the cluster has similarity ≥ 0.3. Implemented via union-find for efficiency. |
| 85 | + |
| 86 | +### Step 4: Group Scoring |
| 87 | +Each cluster gets a composite score: |
| 88 | + |
| 89 | +``` |
| 90 | +group_score = (cluster_size / total_agents) * 0.5 + avg_pairwise_similarity * 0.5 |
| 91 | +``` |
| 92 | + |
| 93 | +This combines "how many agents agree" with "how similar their actual changes are." |
| 94 | + |
| 95 | +### Why Jaccard? |
| 96 | +- Simple, interpretable (0-1 scale) |
| 97 | +- Works on sets of lines without requiring alignment |
| 98 | +- Handles different ordering of the same changes |
| 99 | +- Insensitive to surrounding context (only compares what was added) |
| 100 | + |
| 101 | +### Limitations |
| 102 | +- Treats all added lines equally (a comment change and a logic change have equal weight) |
| 103 | +- Doesn't detect semantic equivalence (two implementations that do the same thing differently score as 0) |
| 104 | +- Whitespace-sensitive (reformatted code may appear different) |
| 105 | + |
| 106 | +## Recommendation Scoring |
| 107 | + |
| 108 | +Each agent receives a composite score: |
| 109 | + |
| 110 | +| Signal | Points | Rationale | |
| 111 | +|--------|--------|-----------| |
| 112 | +| Tests pass | +100 | Strongest signal — code works | |
| 113 | +| Convergence group | +0 to +50 | group_score × 50 — consensus is confidence | |
| 114 | +| Smaller diff | +0 to +10 | (1 - normalized_size) × 10 — simpler is better | |
| 115 | + |
| 116 | +The agent with the highest total score is recommended. Ties broken by the first agent. |
| 117 | + |
| 118 | +### Why these weights? |
| 119 | +- Tests (100) dominate because correctness trumps everything |
| 120 | +- Convergence (50) is secondary — agreement without tests is weaker evidence |
| 121 | +- Diff size (10) is a tiebreaker — among equally correct solutions, prefer the simpler one |
| 122 | + |
| 123 | +## Security Model |
| 124 | + |
| 125 | +- **Test command validation**: Commands are checked for shell operators (`;|&\`><`) before execution |
| 126 | +- **Agent isolation**: Each agent runs in a separate git worktree with no shared state |
| 127 | +- **Result redaction**: Saved JSON files strip stdout/stderr to prevent credential leakage |
| 128 | +- **File permissions**: `.thinktank/` files written with mode 0o600 (owner-only) |
| 129 | + |
| 130 | +## Data Flow |
| 131 | + |
| 132 | +``` |
| 133 | +prompt ──▶ N × claude -p ──▶ N × git diff ──▶ pairwise similarity |
| 134 | + ──▶ clustering |
| 135 | + ──▶ scoring |
| 136 | + ──▶ recommendation |
| 137 | + ──▶ .thinktank/latest.json |
| 138 | +``` |
0 commit comments