You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix: resolve triple-shot second pass caused by merge recommendations (#662)
When the judge recommends "merge" strategy, it populates the
suggested_changes field. LLMs frequently write this as a plain string
instead of []string, causing json.Unmarshal to fail. The parse failure
made VerifyWork return false, the bridge called gate.Fail(), and with
defaultMaxRetries=2 the task retried — spawning a duplicate judge.
Add FlexibleStringSlice type (mirrors existing FlexibleString) to
tolerate string/array mismatches in all LLM-parsed sentinel file
structs: Evaluation, AttemptEvaluationItem, AdversarialReviewFile.
Also log SetMaxRetries errors instead of silently discarding, and
consolidate the redundant Team("judge") lookup in startJudge.
-`internal/orchestrator/workflows/tripleshot/teamwire/` — Adapts TripleShot to Orchestration 2.0 teams via `TeamCoordinator` + bridge adapters *(has `AGENTS.md`)*
313
314
-`internal/pipeline/` — Plan decomposer and multi-phase team pipeline *(has `AGENTS.md`)*
Copy file name to clipboardExpand all lines: CHANGELOG.md
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,6 +19,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
19
19
20
20
### Fixed
21
21
22
+
-**Triple-Shot Spurious Second Pass** - Fixed duplicate instance creation in triple-shot workflows. Two root causes: (1) TaskQueue's `defaultMaxRetries=2` caused failed attempt/judge tasks to retry, spawning new instances. Fixed by calling `SetMaxRetries(taskID, 0)` after team creation. (2) Judge "merge" recommendations caused `json.Unmarshal` to fail when the LLM wrote `suggested_changes` as a string instead of `[]string`. The evaluation file parse failure triggered a retry, creating a second judge. Fixed by adding `FlexibleStringSlice` type (mirrors existing `FlexibleString`) to tolerate string/array mismatches in all LLM-parsed sentinel file structs (`Evaluation`, `AttemptEvaluationItem`, `AdversarialReviewFile`).
23
+
22
24
-**Teamwire TUI Freeze** - Fixed TUI freeze when starting a triple-shot in teamwire mode. `coordinator.Start()` was called synchronously in the Bubble Tea `Update()` handler, blocking the event loop while bridges created git worktrees. Moved startup to an async `tea.Cmd` so the UI remains responsive during initialization.
23
25
24
26
-**Teamwire Channel Safety** - Fixed potential panic from closing `teamwireEventCh` while callbacks may still write to it (nil-guard before close), goroutine leak from re-subscribing after triple-shot completion, and channel overwrite leak when starting multiple sessions. Surfaced session error details in `PhaseFailed` handler instead of generic "Triple-shot failed" message.
> **Living document.** Update this file when you learn something specific to this package.
4
+
> Same rules as the root `AGENTS.md` — see its Self-Improvement Protocol.
5
+
6
+
## Pitfalls
7
+
8
+
-**LLM output type mismatches in sentinel files** — LLMs frequently write a plain string where the JSON schema expects `[]string` (e.g., `"suggested_changes": "fix the bug"` instead of `"suggested_changes": ["fix the bug"]`). The `Evaluation`, `AttemptEvaluationItem`, and `AdversarialReviewFile` structs use `FlexibleStringSlice` for all `[]string` fields and `FlexibleString` for `Reasoning` to tolerate this. When adding new LLM-parsed fields of type `string` or `[]string`, use these flexible types instead of bare Go types. Without this, `json.Unmarshal` fails, `VerifyWork` returns false, and the bridge retries the task — spawning a duplicate instance.
9
+
-**Sentinel file search in subdirectories** — `FindCompletionFile`, `FindEvaluationFile`, and `FindAdversarialReviewFile` all search the worktree root *and* immediate subdirectories. LLM instances sometimes write files relative to their CWD rather than the worktree root. Don't bypass `Find*File` with a direct `filepath.Join(worktree, filename)`.
Copy file name to clipboardExpand all lines: internal/orchestrator/workflows/tripleshot/teamwire/AGENTS.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -32,7 +32,7 @@ TeamCoordinator
32
32
-**Two-phase Start** — `Start()` must not hold `tc.mu` when calling `Bridge.Start()`. The bridge's claim loop publishes `BridgeTaskStartedEvent` synchronously, and the handler `onBridgeTaskStarted` acquires `tc.mu`. Holding the lock through `Start()` → bridge claim → event publish → handler → lock = deadlock. The fix: `registerStart()` holds/releases the lock, then `Start()` creates bridges outside it.
33
33
-**Event subscription timing** — Subscriptions must happen before `Bridge.Start()` launches the claim loop. Currently done in `registerStart()` (Phase 1, under lock, before Phase 2 bridge creation) — this is the safe window. Don't move subscriptions after Phase 2 begins. For test assertions where you need events, subscribe before calling `Start()`. For production callbacks, use `SetCallbacks` before `Start`.
34
34
-**`onTeamCompleted` dispatches to goroutine** — The handler for `team.completed` dispatches `startJudge()` via `go` to avoid deadlock. The synchronous event bus would block if `startJudge` tried to publish events while the bus's `Publish` goroutine holds a lock.
35
-
-**Bridge retry vs. completion file status** — When `VerifyWork` returns `success=false` (e.g., completion file has `"failed"` status), the bridge calls `gate.Fail()`. Due to TaskQueueretry logic (`defaultMaxRetries=2`), the task returns to Pending and gets re-claimed by the bridge. Each re-claim creates a new instance with a new empty worktree. Tests that depend on failure being final must account for this retry cycle or test handler methods directly.
35
+
-**Retries disabled for tripleshot tasks** — `registerStart()` and `startJudge()` call `SetMaxRetries(taskID, 0)`to disable TaskQueue's default retry logic (`defaultMaxRetries=2`). Without this, failed attempt/judge tasks would return to Pending and spawn duplicate instances, appearing as a spurious "second pass." The triple-shot workflow has its own redundancy (3 independent attempts), so retrying individual tasks is counterproductive.
36
36
-**Every `onJudgeCompleted` failure path must publish `TripleShotJudgeCompletedEvent`** — Use the `failJudge()` helper, which sets session error, transitions to `PhaseFailed`, fires callbacks, and publishes the event. Forgetting the event on one path breaks downstream listeners.
37
37
-**Session mutation lock discipline** — `tsManager.Session()` returns a raw `*Session` pointer; the `tsManager.mu` RLock only protects the pointer swap, not field access. All session field reads *and* mutations (`JudgeID`, `CompletedAt`, `Error`, `Attempts[i].*`) must hold `tc.mu`. `GetWinningBranch()` also holds `tc.mu` for reads. The lock order `tc.mu → tsManager.mu` is safe (no reverse path exists). Functions like `failJudge` and `startJudge` error paths acquire `tc.mu` for mutations, then release before `notifyCallbacks`/`bus.Publish` to avoid deadlock.
38
38
-**`startJudge` snapshot-then-I/O pattern** — `startJudge()` must snapshot attempt data (Status, InstanceID) under `tc.mu` before releasing the lock for I/O (GetInstance, ParseCompletionFile). After I/O completes, it re-acquires `tc.mu` to write results back (WorktreePath, Branch) and build the judge prompt. Without the snapshot, `onBridgeTaskCompleted` can write `Attempts[i].Status` concurrently, causing a data race.
0 commit comments