Iron-Ham
diff --git a/‎AGENTS.md‎
Lines changed: 1 addition & 0 deletions b/‎AGENTS.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎CHANGELOG.md‎
Lines changed: 2 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎internal/bridge/AGENTS.md‎
Lines changed: 7 additions & 1 deletion b/‎internal/bridge/AGENTS.md‎
Lines changed: 7 additions & 1 deletion
diff --git a/‎internal/bridge/bridge.go‎
Lines changed: 117 additions & 4 deletions b/‎internal/bridge/bridge.go‎
Lines changed: 117 additions & 4 deletions
diff --git a/‎internal/coordination/hub.go‎
Lines changed: 1 addition & 1 deletion b/‎internal/coordination/hub.go‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎internal/mailbox/AGENTS.md‎
Lines changed: 1 addition & 0 deletions b/‎internal/mailbox/AGENTS.md‎
Lines changed: 1 addition & 0 deletions
@@ -340,6 +340,7 @@ These are real issues agents have encountered in this codebase. Package-specific
 - **Release locks before blocking on Stop()** — When stopping a component that holds a mutex, copy shared state (e.g., a slice of bridges) under the lock, release the lock, then perform blocking cleanup. Holding a lock while calling `bridge.Stop()` (which calls `wg.Wait()`) blocks goroutines that need the same lock. See `PipelineExecutor.Stop()` in `bridgewire/executor.go`.
 - **Two-phase event publishing for cascading state changes** — When an event handler (`onTeamCompleted`) modifies state that triggers further events of the same type, use a two-phase approach: (1) collect state changes under the lock, (2) publish events outside the lock. Repeat until no new transitions occur. Publishing `TeamCompletedEvent` from within the `onTeamCompleted` handler would re-enter the handler via the synchronous bus, deadlocking on `m.mu`. See `team.Manager.checkBlockedTeamsLocked`.
 - **Semaphore slot lifecycle in bridge** — When the bridge acquires a semaphore slot before `ClaimNext`, it must release on every non-monitor path (claim error, nil task, create/start failure). The monitor goroutine takes ownership of the slot via `defer b.sem.Release()`. Missing a release on any early-return path causes a permanent slot leak that eventually deadlocks the claim loop.
+- **Release vs Fail for scheduling conflicts** — When a task fails due to a scheduling conflict (file lock contention), use `gate.Release()` to return it to pending instead of `gate.Fail()`. `Fail` decrements the retry counter; with scaling enabled, multiple tasks competing for the same resource can exhaust all retries and permanently fail. `Release` puts the task back without consuming retries. Always pair Release with `waitForWake` to prevent hot retry loops.
 
 ---
 
 
@@ -9,6 +9,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ### Added
 
+- **Wire Dormant Orchestration 2.0 Components** - Connected five previously-dormant Orch 2.0 components to the production execution flow: (1) **File Lock Registry** in Bridge prevents concurrent file edits by claiming locks before instance creation and releasing on all exit paths, using Release instead of Fail for lock conflicts to avoid burning retries; (2) **Context Propagation** injects prior discoveries into task prompts and shares completion info for cross-instance awareness; (3) **Mailbox Event Publishing** makes all inter-instance messages visible to the event bus via `MailboxMessageEvent` using a `WithBus` functional option; (4) **Adaptive Lead Observability** logs scaling signal recommendations in the pipeline executor; (5) **Approval Auto-Approve** immediately approves gated tasks to prevent stuck states while preserving gate infrastructure for future interactive use; (6) **Debate Protocol Integration** identifies conflicting task outcomes between execution and review phases and records structured debate sessions for reviewer context (opt-in via `WithDebate()`).
+
 - **Orchestration 2.0 Default Execution** - Made Orch 2.0 the default for both UltraPlan and TripleShot. UltraPlan flips `UsePipeline` default to `true`. TripleShot uses `teamwire.TeamCoordinator` with callback-driven execution (replacing file polling), falling back to legacy for adversarial mode or `tripleshot.use_legacy` config. Added `tripleshot.Runner` interface for dual-coordinator coexistence, channel bridge for teamwire callbacks into Bubble Tea, and `NewTripleShotAdapters()` factory to avoid import cycles.
 
 - **Pipeline Execution Path** - Wired the Orchestration 2.0 pipeline stack into `Coordinator.StartExecution()`. Added `ExecutionRunner` interface in `orchestrator` (implemented by `bridgewire.PipelineRunner`) with factory-based injection to avoid import cycles. When `UsePipeline` config is enabled, the Coordinator delegates execution to the pipeline backend instead of the legacy `ExecutionOrchestrator`. Subscribes to `pipeline.completed` events for synthesis/failure handling. Guards legacy-only methods (`RetryFailedTasks`, `RetriggerGroup`, `ResumeWithPartialWork`) when pipeline is active.
 
@@ -11,9 +11,12 @@ The bridge package connects team Hubs (Orchestration 2.0's task pipeline) to rea
 
 **Core Flow:**
 ```
-Gate.ClaimNext() → InstanceFactory.CreateInstance() → StartInstance()
+Gate.ClaimNext() → FileLockRegistry.ClaimMultiple() → ContextPropagation
+    → InstanceFactory.CreateInstance() → StartInstance()
+    → Gate.MarkRunning() → (auto-approve if gated)
     → monitor loop (poll CompletionChecker)
     → Gate.Complete/Fail() + SessionRecorder
+    → FileLockRegistry.ReleaseAll() + ContextPropagation.ShareDiscovery()
 ```
 
 **Interfaces (Ports):**
@@ -43,6 +46,9 @@ These interfaces are implemented by adapters in `internal/orchestrator/bridgewir
 - **Retry limit on completion check errors** — The monitor gives up after `maxCheckErrors` (10) consecutive `CheckCompletion` failures and fails the task. Without this, a bad worktree path would cause indefinite retries.
 - **TaskQueue retry interacts with bridge claim loop** — `TaskQueue.Fail()` has retry logic (`defaultMaxRetries=2`). When the bridge monitor calls `gate.Fail()`, the task may return to `TaskPending` (not permanently failed), and the claim loop re-claims it. Tests that assert on `Running()` after failure must either disable retries via `SetMaxRetries(taskID, 0)` or account for the re-claim cycle.
 - **Always log gate.Fail errors** — `gate.Fail()` can fail if the task has already transitioned. Always check and log the return error rather than discarding with `_ =`.
+- **File lock conflicts use Release, not Fail** — When `ClaimMultiple` returns `ErrAlreadyClaimed`, use `gate.Release` to return the task to pending without burning retries. Using `gate.Fail` would consume retry attempts, and with scaling enabled (semaphore > 1), multiple tasks competing for the same file lock would exhaust retries and permanently fail. After releasing, call `waitForWake` to avoid a hot retry loop.
+- **Record completion/failure before file lock release** — `recorder.RecordCompletion`/`RecordFailure` must be called immediately after `gate.Complete`/`gate.Fail`, before `reg.ReleaseAll` and `shareCompletion`. The gate transition triggers a synchronous event cascade that can complete the pipeline before the monitor goroutine reaches subsequent lines. If the recorder call comes after file lock I/O, tests (and observers) see the pipeline complete before the recorder fires.
+- **Scaling monitor increases semaphore concurrency** — The hub's `ScalingMonitor` reacts to `QueueDepthChangedEvent` and may increase the bridge's semaphore limit via the `OnDecision` callback. Code that assumes semaphore=1 (sequential task execution) is incorrect when scaling is active. File lock claims are the safety net for concurrent access to the same files.
 
 ## Testing
 
 
@@ -2,13 +2,16 @@ package bridge
 
 import (
 	"context"
+	"errors"
 	"fmt"
 	"strings"
 	"sync"
 	"time"
 
 	"github.com/Iron-Ham/claudio/internal/event"
+	"github.com/Iron-Ham/claudio/internal/filelock"
 	"github.com/Iron-Ham/claudio/internal/logging"
+	"github.com/Iron-Ham/claudio/internal/mailbox"
 	"github.com/Iron-Ham/claudio/internal/team"
 )
 
@@ -187,11 +190,45 @@ func (b *Bridge) claimLoop() {
 			continue
 		}
 
-		// Build a prompt and create an instance.
-		prompt := BuildTaskPrompt(task.Title, task.Description, task.Files)
+		hub := b.team.Hub()
+
+		// Claim file locks to prevent concurrent edits.
+		if len(task.Files) > 0 {
+			if err := hub.FileLockRegistry().ClaimMultiple(task.ID, task.Files); err != nil {
+				b.sem.Release()
+				if errors.Is(err, filelock.ErrAlreadyClaimed) {
+					// File is held by another task — release back to the
+					// queue without burning a retry.  The task will be
+					// re-claimed once the lock holder finishes.
+					b.logger.Debug("bridge: file lock conflict, releasing task",
+						"team", b.team.Spec().ID, "task", task.ID, "error", err)
+					if relErr := gate.Release(task.ID, "file lock conflict"); relErr != nil {
+						b.logger.Error("bridge: gate.Release failed",
+							"task", task.ID, "error", relErr)
+					}
+				} else {
+					b.logger.Error("bridge: file lock claim failed",
+						"team", b.team.Spec().ID, "task", task.ID, "error", err)
+					if failErr := gate.Fail(task.ID, fmt.Sprintf("file lock: %v", err)); failErr != nil {
+						b.logger.Error("bridge: gate.Fail also failed",
+							"task", task.ID, "error", failErr)
+					}
+				}
+				b.waitForWake(wake)
+				continue
+			}
+		}
+
+		// Retrieve prior discoveries for context injection.
+		prompt := BuildTaskPromptWithContext(
+			task.Title, task.Description, task.Files,
+			b.getInstanceContext(task.ID),
+		)
+
 		inst, err := b.factory.CreateInstance(prompt)
 		if err != nil {
 			b.sem.Release()
+			hub.FileLockRegistry().ReleaseAll(task.ID) //nolint:errcheck // best-effort cleanup
 			b.logger.Error("bridge: failed to create instance",
 				"team", b.team.Spec().ID, "task", task.ID, "error", err)
 			if failErr := gate.Fail(task.ID, fmt.Sprintf("create instance: %v", err)); failErr != nil {
@@ -203,6 +240,7 @@ func (b *Bridge) claimLoop() {
 
 		if err := b.factory.StartInstance(inst); err != nil {
 			b.sem.Release()
+			hub.FileLockRegistry().ReleaseAll(task.ID) //nolint:errcheck // best-effort cleanup
 			b.logger.Error("bridge: failed to start instance",
 				"team", b.team.Spec().ID, "task", task.ID, "error", err)
 			if failErr := gate.Fail(task.ID, fmt.Sprintf("start instance: %v", err)); failErr != nil {
@@ -215,6 +253,7 @@ func (b *Bridge) claimLoop() {
 		// Transition the task to running.
 		if err := gate.MarkRunning(task.ID); err != nil {
 			b.sem.Release()
+			hub.FileLockRegistry().ReleaseAll(task.ID) //nolint:errcheck // best-effort cleanup
 			b.logger.Error("bridge: failed to mark running",
 				"team", b.team.Spec().ID, "task", task.ID, "error", err)
 			if failErr := gate.Fail(task.ID, fmt.Sprintf("mark running: %v", err)); failErr != nil {
@@ -224,6 +263,23 @@ func (b *Bridge) claimLoop() {
 			continue
 		}
 
+		// Auto-approve gated tasks to prevent stuck states.
+		if gate.IsAwaitingApproval(task.ID) {
+			if approveErr := gate.Approve(task.ID); approveErr != nil {
+				b.sem.Release()
+				hub.FileLockRegistry().ReleaseAll(task.ID) //nolint:errcheck // best-effort cleanup
+				b.logger.Error("bridge: failed to auto-approve gated task",
+					"team", b.team.Spec().ID, "task", task.ID, "error", approveErr)
+				if failErr := gate.Fail(task.ID, fmt.Sprintf("auto-approve: %v", approveErr)); failErr != nil {
+					b.logger.Error("bridge: gate.Fail also failed",
+						"task", task.ID, "error", failErr)
+				}
+				continue
+			}
+			b.logger.Debug("bridge: auto-approved gated task",
+				"team", b.team.Spec().ID, "task", task.ID)
+		}
+
 		// Record assignment and publish event.
 		b.recorder.AssignTask(task.ID, inst.ID())
 
@@ -267,6 +323,9 @@ func (b *Bridge) monitorInstance(taskID string, inst Instance) {
 
 	consecutiveErrors := 0
 
+	hub := b.team.Hub()
+	reg := hub.FileLockRegistry()
+
 	for {
 		select {
 		case <-b.ctx.Done():
@@ -275,6 +334,7 @@ func (b *Bridge) monitorInstance(taskID string, inst Instance) {
 			b.mu.Lock()
 			delete(b.running, taskID)
 			b.mu.Unlock()
+			reg.ReleaseAll(taskID) //nolint:errcheck // best-effort cleanup
 			return
 		case <-ticker.C:
 		}
@@ -288,7 +348,7 @@ func (b *Bridge) monitorInstance(taskID string, inst Instance) {
 			if consecutiveErrors >= maxCheckErrors {
 				b.logger.Error("bridge: max check errors reached, failing task",
 					"task", taskID, "limit", maxCheckErrors)
-				gate := b.team.Hub().Gate()
+				gate := hub.Gate()
 				reason := fmt.Sprintf("completion check failed %d times: %v", maxCheckErrors, err)
 				if failErr := gate.Fail(taskID, reason); failErr != nil {
 					b.logger.Error("bridge: gate.Fail failed after check errors",
@@ -300,6 +360,7 @@ func (b *Bridge) monitorInstance(taskID string, inst Instance) {
 				delete(b.running, taskID)
 				b.mu.Unlock()
 				b.recorder.RecordFailure(taskID, reason)
+				reg.ReleaseAll(taskID) //nolint:errcheck // best-effort cleanup
 				return
 			}
 			continue
@@ -315,7 +376,7 @@ func (b *Bridge) monitorInstance(taskID string, inst Instance) {
 			taskID, inst.ID(), inst.WorktreePath(), inst.Branch(),
 		)
 
-		gate := b.team.Hub().Gate()
+		gate := hub.Gate()
 		teamID := b.team.Spec().ID
 
 		// Clean up running map before recording/publishing so observers see
@@ -329,7 +390,15 @@ func (b *Bridge) monitorInstance(taskID string, inst Instance) {
 				b.logger.Error("bridge: failed to complete task",
 					"task", taskID, "error", completeErr)
 			}
+			// Record completion immediately after gate transition so that
+			// observers who react to the synchronous event cascade see the
+			// recorder state before any I/O-heavy cleanup runs.
 			b.recorder.RecordCompletion(taskID, commitCount)
+			reg.ReleaseAll(taskID) //nolint:errcheck // best-effort cleanup
+
+			// Share completion as a discovery for context propagation.
+			b.shareCompletion(taskID, inst)
+
 			b.bus.Publish(event.NewBridgeTaskCompletedEvent(
 				teamID, taskID, inst.ID(), true, commitCount, "",
 			))
@@ -342,7 +411,10 @@ func (b *Bridge) monitorInstance(taskID string, inst Instance) {
 				b.logger.Error("bridge: failed to fail task",
 					"task", taskID, "error", failErr)
 			}
+			// Record failure immediately after gate transition (same
+			// reasoning as the success path above).
 			b.recorder.RecordFailure(taskID, reason)
+			reg.ReleaseAll(taskID) //nolint:errcheck // best-effort cleanup
 			b.bus.Publish(event.NewBridgeTaskCompletedEvent(
 				teamID, taskID, inst.ID(), false, commitCount, reason,
 			))
@@ -392,3 +464,44 @@ func BuildTaskPrompt(title, description string, files []string) string {
 
 	return sb.String()
 }
+
+// BuildTaskPromptWithContext builds a task prompt and appends prior discoveries
+// from context propagation. If priorContext is empty, it returns the same
+// result as BuildTaskPrompt.
+func BuildTaskPromptWithContext(title, description string, files []string, priorContext string) string {
+	prompt := BuildTaskPrompt(title, description, files)
+	if priorContext == "" {
+		return prompt
+	}
+	return prompt + "\n\n## Prior Discoveries\n" + priorContext
+}
+
+// maxContextMessages limits the number of prior messages injected into an
+// instance's prompt to prevent unbounded context growth in large sessions.
+const maxContextMessages = 50
+
+// getInstanceContext retrieves prior discoveries from the context propagator.
+// Returns an empty string if no relevant context exists or on error.
+func (b *Bridge) getInstanceContext(taskID string) string {
+	ctx, err := b.team.Hub().Propagator().GetContextForInstance(taskID, mailbox.FilterOptions{
+		Types:       []mailbox.MessageType{mailbox.MessageDiscovery, mailbox.MessageWarning},
+		MaxMessages: maxContextMessages,
+	})
+	if err != nil {
+		b.logger.Warn("bridge: failed to get instance context",
+			"task", taskID, "error", err)
+		return ""
+	}
+	return ctx
+}
+
+// shareCompletion broadcasts a completion discovery so future instances have
+// awareness of what has been done. Only called on success paths.
+func (b *Bridge) shareCompletion(taskID string, inst Instance) {
+	body := fmt.Sprintf("Task completed: %s (instance: %s, worktree: %s)",
+		taskID, inst.ID(), inst.WorktreePath())
+	if err := b.team.Hub().Propagator().ShareDiscovery(taskID, body, nil); err != nil {
+		b.logger.Warn("bridge: failed to share completion discovery",
+			"task", taskID, "error", err)
+	}
+}
@@ -93,7 +93,7 @@ func NewHub(cfg Config, opts ...Option) (*Hub, error) {
 		policy = scaling.NewPolicy(policyOpts...)
 	}
 
-	mb := mailbox.NewMailbox(cfg.SessionDir)
+	mb := mailbox.NewMailbox(cfg.SessionDir, mailbox.WithBus(cfg.Bus))
 	queue := taskqueue.NewFromPlan(cfg.Plan)
 	eq := taskqueue.NewEventQueue(queue, cfg.Bus)
 	gate := approval.NewGate(eq, cfg.Bus, lookup)
 
@@ -10,6 +10,7 @@ See `doc.go` for package overview and API usage.
 - **O_APPEND atomicity** — File writes use `O_APPEND` which is atomic for writes smaller than `PIPE_BUF` (4096 bytes on most systems), but is not crash-safe without `fsync`. This is an accepted trade-off — messages may be lost on hard crash but won't be corrupted or interleaved.
 - **Message ID uniqueness** — `time.UnixNano()` alone is not unique under concurrent access. IDs are generated using an atomic counter combined with PID and timestamp. If you modify ID generation, ensure uniqueness under parallel `Send()` calls.
 - **Store mutex scope** — The `Store` holds a `sync.Mutex` for in-process thread safety. Any method that reads or writes the JSONL file must hold the lock for the entire operation, including the JSON marshal/unmarshal step — not just the file I/O.
+- **WithBus event publishing is synchronous** — When a `Mailbox` is created with `WithBus(bus)`, every successful `Send()` publishes a `MailboxMessageEvent` on the event bus synchronously. Since `event.Bus.Publish` runs handlers inline, callers of `Send` should be aware that handlers may execute significant work in their goroutine. The Hub passes its bus to `NewMailbox` automatically.
 
 ## File Layout
Original file line number	Diff line number	Diff line change
`@@ -93,7 +93,7 @@ func NewHub(cfg Config, opts ...Option) (*Hub, error) {`
`93`	`93`	`policy = scaling.NewPolicy(policyOpts...)`
`94`	`94`	`}`
`95`	`95`
`96`		`- mb := mailbox.NewMailbox(cfg.SessionDir)`
	`96`	`+ mb := mailbox.NewMailbox(cfg.SessionDir, mailbox.WithBus(cfg.Bus))`
`97`	`97`	`queue := taskqueue.NewFromPlan(cfg.Plan)`
`98`	`98`	`eq := taskqueue.NewEventQueue(queue, cfg.Bus)`
`99`	`99`	`gate := approval.NewGate(eq, cfg.Bus, lookup)`