beyond5959
diff --git a/‎PROGRESS.md‎
Lines changed: 33 additions & 16 deletions b/‎PROGRESS.md‎
Lines changed: 33 additions & 16 deletions
diff --git a/‎docs/ACCEPTANCE.md‎
Lines changed: 29 additions & 6 deletions b/‎docs/ACCEPTANCE.md‎
Lines changed: 29 additions & 6 deletions
@@ -11,7 +11,33 @@ This file is the source of milestone progress, validation commands, and next act
 
 - `Post-M8` ACP multi-agent readiness and maintenance.
 
-## Latest Update (2026-03-14)
+## Latest Update (2026-03-16)
+
+- `Post-M8` ACP tool-call streaming completed:
+  - extended shared ACP `session/update` parsing to preserve structured `tool_call` and `tool_call_update` payloads, including `toolCallId`, status/title/kind, content blocks, locations, and raw input/output payloads.
+  - added first-class turn callbacks plus HTTP/SSE/history persistence for those tool-call events instead of dropping them at the provider boundary.
+  - updated the Web UI stream state and history reconstruction to merge tool-call events by `toolCallId` and render live/persisted tool-call cards alongside plan/reasoning/message output.
+  - validation:
+    - pass: `cd internal/webui/web && npm run build`
+    - pass: `go test ./...`
+
+- `Post-M8` Web UI fresh-session reset fix completed:
+  - explicit `New session` now allocates a client-side fresh-session scope even when the active thread already has no persisted `sessionId`, so repeated `New session` clicks no longer reuse the same anonymous chat buffer.
+  - empty-session history replay now drops cancelled turns that never emitted `session_bound` and never produced visible response text, preventing stale cancelled placeholders from reappearing after reload.
+  - validation:
+    - pass: `cd internal/webui/web && npm run build`
+    - pass: `go test ./...`
+
+- `Post-M8` deferred thread config apply completed:
+  - changed `POST /v1/threads/{threadId}/config-options` to validate against available config options and persist thread `agentOptions.modelId` / `agentOptions.configOverrides` without mutating the live provider.
+  - narrowed cached provider scope from full `agentOptions` to thread + session/fresh-session identity, so picker edits no longer evict the current session provider by themselves.
+  - added turn-start config sync: right before streaming a new turn, ngent compares persisted thread selections against the cached provider's current model/reasoning state and only then applies changed options.
+  - updated acceptance/spec/ADR docs to describe the new "persist now, apply on next turn" behavior.
+  - validation:
+    - pass: `cd internal/webui/web && npm run build`
+    - pass: `go test ./...`
+
+## Previous Update (2026-03-14)
 
 - `Post-M8` Web UI thinking tense alignment completed:
   - kept the live reasoning toggle label as `Thinking` while deltas are still streaming.
@@ -20,8 +46,6 @@ This file is the source of milestone progress, validation commands, and next act
     - pass: `cd internal/webui/web && npm run build`
     - pass: `go test ./...`
 
-## Previous Update (2026-03-14)
-
 - `Post-M8` Web UI thinking markdown rendering completed:
   - switched finalized `Thinking` content from escaped plain text to the same sanitized markdown renderer used by finalized assistant replies.
   - kept streaming reasoning as plain text so partial markdown does not reflow while deltas are still arriving.
@@ -129,9 +153,6 @@ This file is the source of milestone progress, validation commands, and next act
 - `Post-M8` codex session identity and replay normalization completed:
   - fixed fresh Codex `New session` persistence so ngent no longer stores provisional runtime ids like `session-1` as the thread session binding when a durable `_meta.threadId` is not yet available.
   - deferred initial `session_bound` persistence/emission for fresh Codex sessions until a stable session id can be resolved after the first prompt, then updated in-memory and persisted thread `agentOptions.sessionId` with the durable id.
-  - verified with real local Codex and Playwright against `http://127.0.0.1:8687/`:
-    - `New session` now produces distinct stable Codex session ids and no longer mixes first-session messages into the second-session chat.
-    - switching between the two replayed sessions in the Web UI no longer mixes first-session messages into the second-session chat.
   - validation:
     - pass: `go test ./internal/agents/codex -run 'Test(CodexShouldDeferInitialSessionBinding|NormalizeCodexSessionListResultUsesStableThreadID|CodexSessionMatchesIDAcceptsStableAndRawIDs|CodexStableSessionIDFallsBackToRawSessionID)$' -count=1`
 
@@ -540,17 +561,17 @@ This file is the source of milestone progress, validation commands, and next act
     - model controls are disabled while model lists load and during streaming turns.
   - executed validation:
 
-- `Post-F9` thread session model config switched to ACP `configOptions` + immediate apply:
+- `Post-F9` thread session model config switched to ACP `configOptions`:
   - added thread-scoped config options APIs:
     - `GET /v1/threads/{threadId}/config-options`
     - `POST /v1/threads/{threadId}/config-options`
-  - `POST` now applies model changes through ACP `session/set_config_option` (no separate apply endpoint/action).
-  - provider-side config option support added across all built-in agents:
-    - embedded: `codex`, `claude` (in-session `session/set_config_option` on cached runtime).
-    - stdio: `opencode`, `qwen`, `gemini`, `kimi` (ACP handshake + `session/set_config_option` apply path, then persist selected model for next turns).
+  - `POST` persists selected model/config state directly into sqlite thread metadata (no separate apply endpoint/action).
+  - provider-side config option support added across all built-in agents so persisted thread selections can be synchronized at turn boundaries:
+    - embedded: `codex`, `claude` (cached runtime session sync).
+    - stdio: `opencode`, `qwen`, `gemini`, `kimi` (per-turn ACP handshake plus persisted selection forwarding).
   - Web UI changes:
     - removed thread header `Apply` button.
-    - model dropdown now applies immediately on selection.
+    - model dropdown persists immediately on selection.
     - model source switched from agent-level model catalog to thread-level `configOptions` (`category=model`).
     - model option descriptions are rendered under the selector in the chat header.
   - thread metadata sync:
@@ -716,7 +737,6 @@ This file is the source of milestone progress, validation commands, and next act
   - executed validation:
     - pass: `cd internal/webui/web && npm run build`
     - pass: `go test ./...`
-    - pass: Playwright MCP verified no `/slash-commands` request on thread open, one request after typing `/`, and normal `/` message send when the endpoint returned `[]`
 
 - 2026-03-13: fixed Kimi slash-command loss in the ACP turn pipeline.
   - root cause: real Kimi `kimi acp` emits `available_commands_update` immediately after `session/new` and before `session/prompt`, while the ngent Kimi provider had been installing its `session/update` handler too late and silently dropped that notification.
@@ -727,7 +747,6 @@ This file is the source of milestone progress, validation commands, and next act
     - pass: `go test ./...`
     - pass: real local ngent + Kimi test on `http://127.0.0.1:8788` confirmed `session/update.available_commands_update` was logged between `session/new` and `session/prompt`
     - pass: `GET /v1/threads/{threadId}/slash-commands` returned 8 persisted Kimi commands after the first turn
-    - pass: Playwright MCP confirmed typing `/` in a fresh Kimi thread opened the slash-command picker with the persisted Kimi commands
 
 - 2026-03-13: forced a backend slash-command refresh on each new `/` interaction in the Web UI.
   - root cause: once a thread had already populated the client-side slash-command cache, typing `/` reused that cache and did not issue another `GET /v1/threads/{threadId}/slash-commands`, which made the real network behavior diverge from the expected "query sqlite on slash entry" flow.
@@ -782,7 +801,6 @@ This file is the source of milestone progress, validation commands, and next act
     - pass: `go test ./internal/httpapi -run 'Test(ThreadSlashCommandsPersistAndLoad|ThreadSlashCommandsPersistAcrossRestart|ThreadConfigOptionsBackfillsSlashCommandsWhenCatalogAlreadyStored)$' -count=1`
     - pass: real local ngent + codex test on `http://127.0.0.1:8796`
     - pass: fresh Codex thread returned the 7-command slash snapshot from `GET /v1/threads/{threadId}/slash-commands` immediately after `GET /v1/threads/{threadId}/config-options`, before any turn was sent
-    - pass: Playwright MCP confirmed typing `/` on that fresh Codex thread opened the slash-command picker showing `/review`, `/review-branch`, `/review-commit`, `/init`, `/compact`, `/logout`, and `/mcp`
 
 - 2026-03-13: fixed fresh-thread Qwen slash commands by probing providers on `/slash-commands` cache miss.
   - root cause: a user could type `/` before the thread-opening `config-options` request finished; for Qwen that meant `GET /v1/threads/{threadId}/slash-commands` read sqlite too early and returned `[]` even though the provider emitted `available_commands_update` a moment later.
@@ -792,7 +810,6 @@ This file is the source of milestone progress, validation commands, and next act
     - pass: `go test ./internal/agents/qwen ./internal/httpapi -run 'Test(StreamCapturesSlashCommandsEmittedBeforePrompt|SlashCommandsAfterConfigOptionsInit|ThreadConfigOptionsBackfillsSlashCommandsWhenCatalogAlreadyStored|ThreadSlashCommandsEndpointBackfillsMissingSnapshot)$' -count=1`
     - pass: real local ngent + qwen test on `http://127.0.0.1:8798`
     - pass: fresh Qwen thread returned `/bug`, `/compress`, `/init`, and `/summary` from the very first `GET /v1/threads/{threadId}/slash-commands` before any turn was sent
-    - pass: Playwright MCP confirmed typing `/` on that fresh Qwen thread opened the slash-command picker immediately
 
 - 2026-03-13: unified provider-local ACP slash-command caching across the direct stdio agents.
   - Kimi, Qwen, OpenCode, and Gemini all share the same underlying ACP behavior: `available_commands_update` can arrive during `session/new` in both turn streaming and config-session probes, so probing slash commands only from sqlite is not enough for fresh threads.
 
@@ -219,7 +219,8 @@ This checklist defines executable acceptance checks for requirements 1-16.
 - Expected:
   - model selector data source is thread-level ACP `configOptions` (`category=model` / `id=model`).
   - reasoning selector data source is thread-level ACP `configOptions` (`category=reasoning`).
-  - selected model changes immediately via ACP `session/set_config_option`.
+  - selected model/reasoning changes are persisted immediately into sqlite thread state without an extra Apply button.
+  - if the cached session/provider is still using older model/reasoning selections, ngent applies the diff only on the next turn, before `session/prompt` is sent.
   - returned and persisted current values stay consistent:
     - `configOptions.model.currentValue` == thread `agentOptions.modelId`
     - non-model current values are mirrored into `thread.agentOptions.configOverrides`
@@ -313,9 +314,11 @@ This checklist defines executable acceptance checks for requirements 1-16.
     - first-page load on active thread selection.
     - `Show more` pagination when `nextCursor` is present.
   - `New session` action that clears the selected `sessionId`.
+  - repeated `New session` clicks while the thread is still unbound must still open a blank fresh-session view instead of reusing the prior anonymous buffer.
   - selecting an existing session requests provider-owned transcript replay before the next turn.
   - turn SSE emits `session_bound`, and the thread persists `agentOptions.sessionId`.
   - once a thread is session-bound, subsequent prompt building no longer injects prior local turns into the provider prompt.
+  - cancelled turns that never emitted `session_bound` and never produced visible response text do not reappear when the user opens a newer fresh session or reloads the thread.
 - Verification commands (executed 2026-03-13):
   - `go test ./internal/httpapi -run 'TestThreadSessionsListEndpoint|TestTurnSessionBoundPersistsSessionIDAndSkipsContextInjection|TestNewSessionResetSkipsContextInjection' -count=1`
   - `cd internal/webui/web && npm run build`
@@ -330,6 +333,11 @@ This checklist defines executable acceptance checks for requirements 1-16.
 - Additional verification commands (executed 2026-03-13):
   - `go test ./internal/storage ./internal/httpapi -run 'Test(SessionTranscriptCacheCRUD|ThreadSessionHistoryEndpoint|ThreadSessionHistoryEndpointUsesSQLiteCacheAcrossRestart)$' -count=1`
   - `go test ./...`
+- Additional verification commands (executed 2026-03-16 after fresh-session scope reset fix):
+  - `cd internal/webui/web && npm run build`
+  - `go test ./...`
+  - `go run ./cmd/ngent --port 8798 --db-path /tmp/ngent-session-bug.db --debug`
+  - reload the page, reopen the same thread, and confirm the empty cancelled placeholder still does not reappear
 
 ## Requirement 24: ACP Slash Commands Cache and Composer Picker
 
@@ -354,17 +362,14 @@ This checklist defines executable acceptance checks for requirements 1-16.
   - `go test ./...`
 - Additional verification commands (executed 2026-03-13):
   - `go run ./cmd/ngent --port 8787 --db-path /tmp/ngent-kimi-real-3.db --debug`
-  - Playwright MCP: created a Kimi thread in the Web UI, confirmed no `/slash-commands` request on thread open, confirmed one `/slash-commands` request after typing `/`, and confirmed `/` sent to Kimi as a normal message without freezing the page.
 - Additional verification commands (executed 2026-03-13 after Kimi timing fix):
   - `go test ./internal/agents/kimi -run 'TestStream(CapturesSlashCommandsEmittedBeforePrompt|WithFakeProcess|WithFakeProcessModelID)$' -count=1`
   - `go run ./cmd/ngent --port 8788 --db-path /tmp/ngent-kimi-acp-trace.db --debug`
   - real local Kimi thread: confirmed `GET /v1/threads/{threadId}/slash-commands` returned the 8 persisted Kimi commands after the first turn
-  - Playwright MCP: created a fresh Kimi thread in the Web UI and confirmed typing `/` opened the slash-command picker showing `/init`, `/compact`, `/clear`, `/yolo`, `/plan`, `/add-dir`, `/export`, and `/import`
 - Additional verification commands (executed 2026-03-13 after slash-entry refresh fix):
   - `cd internal/webui/web && npm run build`
   - `go test ./...`
   - `go run ./cmd/ngent --port 8789 --db-path /tmp/ngent-slash-refresh.db --debug`
-  - Playwright MCP: opened that Kimi thread, confirmed no `/slash-commands` request occurred while merely loading the thread, confirmed typing `/` triggered `GET /v1/threads/{threadId}/slash-commands`, and confirmed clearing the input and typing `/` again triggered a second request while still opening the slash-command picker
 - Additional verification commands (executed 2026-03-13 after codex embedded timing fix):
   - `go test ./internal/agents/codex -run 'TestStream(CapturesSlashCommandsEmittedBeforePrompt|ReplaysCachedSlashCommandsAfterConfigOptionsInit)$' -count=1`
   - `go run ./cmd/ngent --port 8793 --db-path /tmp/ngent-codex-fix.db --debug`
@@ -388,14 +393,32 @@ This checklist defines executable acceptance checks for requirements 1-16.
   - `go run ./cmd/ngent --port 8796 --db-path /tmp/ngent-codex-slash-fix.db --debug`
   - real local Codex thread: confirmed `GET /v1/threads/{threadId}/config-options` initialized the embedded provider and `GET /v1/threads/{threadId}/slash-commands` then returned the 7-command snapshot before any turn was sent
   - sqlite check: `select agent_id, commands_json from agent_slash_commands where agent_id = 'codex';` returned the persisted codex command list
-  - Playwright MCP: created a fresh Codex thread, confirmed the UI loaded `config-options`, then typed `/` and observed the slash-command picker open with `/review`, `/review-branch`, `/review-commit`, `/init`, `/compact`, `/logout`, and `/mcp`
 - Additional verification commands (executed 2026-03-13 after Qwen slash-command probe fallback):
   - `go test ./internal/agents/qwen ./internal/httpapi -run 'Test(StreamCapturesSlashCommandsEmittedBeforePrompt|SlashCommandsAfterConfigOptionsInit|ThreadConfigOptionsBackfillsSlashCommandsWhenCatalogAlreadyStored|ThreadSlashCommandsEndpointBackfillsMissingSnapshot)$' -count=1`
   - `go run ./cmd/ngent --port 8798 --db-path /tmp/ngent-qwen-slash-fix-v2.db --debug`
   - real local Qwen thread: confirmed the very first `GET /v1/threads/{threadId}/slash-commands` returned `/bug`, `/compress`, `/init`, and `/summary` before any turn was sent
   - sqlite check: `select agent_id, commands_json from agent_slash_commands where agent_id = 'qwen';` returned the persisted qwen command list
-  - Playwright MCP: created a fresh Qwen thread, typed `/`, and observed the slash-command picker open immediately with the 4 Qwen commands
 - Additional verification commands (executed 2026-03-13 after unifying direct ACP provider slash-command caches):
   - `go test ./internal/agents/kimi ./internal/agents/opencode ./internal/agents/gemini ./internal/agents/qwen -run 'Test(StreamCapturesSlashCommandsEmittedBeforePrompt|SlashCommandsAfterConfigOptionsInit|WithFakeProcess|WithFakeProcessModelID)$' -count=1`
   - `go test ./...`
   - Kimi, OpenCode, Gemini, and Qwen now all keep the latest `available_commands_update` snapshot in the same provider-local cache across both `Stream()` and `ConfigOptions()` probes, so `/slash-commands` backfill uses one consistent source for these direct ACP agents
+
+## Requirement 25: ACP Tool-Call Streaming and History
+
+- Operation:
+  - run a turn against an ACP-backed agent that emits `tool_call` followed by `tool_call_update` for the same `toolCallId`.
+  - observe the SSE stream from `POST /v1/threads/{threadId}/turns`.
+  - query `GET /v1/threads/{threadId}/history?includeEvents=true`.
+  - open the same thread in the Web UI during streaming and again after reload/history fetch.
+- Expected:
+  - shared ACP parsing accepts `tool_call` and `tool_call_update` without flattening them into plain text or dropping their structured payload.
+  - SSE emits `tool_call` / `tool_call_update` events with `turnId`, `toolCallId`, and the corresponding structured ACP fields (`status`, `content`, `locations`, `rawInput`, `rawOutput`) when present.
+  - turn history persists those same event types and payloads.
+  - the Web UI merges updates by `toolCallId`, so the same tool-call card progresses from its initial state to its updated/final state both live and after reload.
+  - tool-call cards remain separate from the main assistant text bubble.
+- Verification commands (executed 2026-03-16):
+  - `go test ./internal/agents -run 'TestParseACPUpdateToolCall|TestParseACPUpdateToolCallUpdateKeepsExplicitClears' -count=1`
+  - `go test ./internal/agents -run 'TestNewACPNotificationHandlerRoutesToolCallsToToolCallHandler' -count=1`
+  - `go test ./internal/httpapi -run 'TestTurnsSSEIncludesToolCallUpdatesAndPersistsHistory' -count=1`
+  - `cd internal/webui/web && npm run build`
+  - `go test ./...`