tenstorrent · rfatimaTT · Mar 17, 2026 · anirudTT · Mar 18, 2026 · rfatimaTT
diff --git a/docs/bounties/per-model-chat-session-and-metrics.md b/docs/bounties/per-model-chat-session-and-metrics.md
@@ -0,0 +1,175 @@
+# Bounty: Per-Model Chat Sessions with Isolated Metrics
+
+## Background
+
+TT Studio's chat UI (`/chat`) supports multiple conversation threads stored in IndexedDB,
+but two behaviors make it feel disconnected from the model management workflow:
+
+1. **No automatic new chat on model open** — clicking "Chat" on a deployed model from the
+   Models page navigates to `/chat` and resumes whatever conversation the user last had. There
+   is no automatic new thread started for the freshly-chosen model, so users land in an
+   unrelated conversation history.
+
+2. **Metrics display the wrong model name** — `modelName` is a single live state variable in
+   `ChatComponent` passed as a prop down to every rendered message. When the user switches
+   models, **all previously generated messages instantly re-label their stats panel with the
+   new model**, even though those messages were generated by the original model. The metric
+   numbers (TTFT, TPOT, tokens) are stored correctly per message in `ChatMessage.inferenceStats`,
+   but the model identity label is not — it is never persisted at inference time.
+
+---
+
+## Goal
+
+1. Every time a model is deployed and the user navigates to the chat page, it opens a
+   **new, blank conversation thread** scoped to that model automatically — no manual
+   "New Chat" click required.
+2. Each chat thread's metrics are fully isolated: chatting with Model A, then deploying
+   Model B (which opens its own new chat), then navigating back to Model A's chat must
+   leave Model A's metrics completely unchanged.
+
+---
+
+## What Success Looks Like
+
+### Feature 1 — Auto-Open New Chat on Model Launch
+
+- [ ] The Models page "Chat" action navigates to `/chat?modelId=<container_id>` (or equivalent
+  route parameter) instead of bare `/chat`.
+- [ ] When `ChatComponent` mounts with a `modelId` query parameter, it calls
+  `createNewConversation()` automatically and pre-selects that model in the model picker.
+- [ ] The new thread appears at the top of `HistoryPanel`, titled with the model display name
+  (e.g. "Chat with Llama-3.3-70B") or the default "New Chat N" pattern.
+- [ ] Navigating to `/chat` without a query param keeps the existing behavior (resume last
+  thread, no forced new conversation).
+- [ ] Pressing "Chat" twice for the same model does not stack duplicate empty threads — if the
+  current thread is already empty and for the same model, reuse it.
+
+### Feature 2 — Per-Message Model Identity in Metrics
+
+- [ ] `ChatMessage.inferenceStats` (or a sibling field `ChatMessage.modelName`) stores the
+  model display name at the moment inference completes inside `runInference.ts`.
+- [ ] `InferenceStats.tsx` prefers the stored model name from the message over the live
+  `modelName` prop when rendering historical messages.
+- [ ] Switching from Model A to Model B does **not** change the model label shown in stats
+  panels for messages that were generated by Model A.
+- [ ] Newly generated messages correctly display the model that produced them.
+- [ ] The fix is backward-compatible: older persisted messages that pre-date this change and
+  lack a stored model name gracefully fall back to displaying no model label (or a generic
+  "Unknown model") rather than crashing.
+
+---
+
+## Architecture
+
+### Changes Required
+
+#### Backend — none required for either feature
+
+#### Frontend
+
+| File | Change |
+|------|--------|
+| `app/frontend/src/components/chatui/types.ts` | Add optional `modelName?: string` to `InferenceStats` (or to `ChatMessage`) |
+| `app/frontend/src/components/chatui/runInference.ts` | Pass `modelName` into `runInference` and include it in the `InferenceStats` object before attaching to the message |
+| `app/frontend/src/components/chatui/InferenceStats.tsx` | Prefer `stats.modelName` over the `modelName` prop; fall back gracefully if absent |
+| `app/frontend/src/components/chatui/ChatComponent.tsx` | Read `?modelId` query param on mount; call `createNewConversation()` + pre-select model when param is present; skip if current thread is already empty for that model |
+| `app/frontend/src/components/models/row-cells/ManageCell.tsx` (or equivalent) | Update "Chat" navigation link to include `?modelId=<container_id>` |
+
+---
+
+## Build Phases
+
+### Phase 1 — Store model name per message (Bug 2)
+
+1. Add `modelName?: string` to `InferenceStats` in `types.ts`.
+2. Thread `modelName` as a parameter into `runInference()`.
+3. In `runInference.ts`, include `modelName` when building the final `InferenceStats`
+   object (alongside TTFT, TPOT, etc.) before attaching it to the message.
+4. In `InferenceStats.tsx`, update `getDisplayModelName()` to check `stats.modelName`
+   first; fall back to the `modelName` prop for backward compat.
+5. Verify: generate two messages with Model A, switch to Model B, generate a third — all
+   three stats panels should show the correct model.
+
+### Phase 2 — Auto-open new chat on model launch (Feature 1)
+
+6. Update the "Chat" navigation action in the models table to append `?modelId=<container_id>`.
+7. In `ChatComponent`, read `modelId` from `useSearchParams()` (React Router) on mount.
+8. When `modelId` is present: look up the model display name from the deployed models list,
+   call `createNewConversation()`, then call `setModelID` / `setModelName` with the resolved
+   model — all in a single `useEffect` on first mount.
+9. Add the duplicate-guard: if `chatThreads[currentThreadIndex].messages.length === 0` and
+   the current model already matches, skip creating a new thread.
+10. Remove the `?modelId` param from the URL after consumption so that a browser refresh
+    does not re-trigger the effect (use `history.replace`).
+
+---
+
+## Complexity Traps to Avoid
+
+- **Don't change `InferenceStats` in a breaking way** — older persisted messages in IndexedDB
+  will not have `modelName`; the display code must handle `undefined` cleanly (no crash, no
+  blank white panel).
+- **Don't call `createNewConversation` on every render** — gate the `useEffect` with a
+  `hasInitialized` ref so it fires only once per navigation to `/chat?modelId=`.
+- **Model lookup timing** — deployed models may not be loaded when `ChatComponent` mounts;
+  `modelName` resolution should be async/retry-safe (wait for the model list to populate
+  before setting the thread title).
+- **Thread limit** — `usePersistentState` enforces a max of 20 threads; the auto-create
+  logic should respect this (existing threads get pruned as per current FIFO logic).
+- **IndexedDB schema** — `inferenceStats` is serialised to IndexedDB as-is; adding a new
+  field to the interface is backward-compatible (existing records simply won't have it).
+
+---
+
+## Validation
+
+### Feature 1
+
+```
+1. Deploy a model (e.g. Llama-3.3-70B).
+2. From the Models page, click "Chat".
+3. Verify: navigated to /chat, a brand-new empty thread is active,
+   the model picker shows Llama-3.3-70B.
+4. Type a message and send.
+5. Go back to Models page, click "Chat" on the same model again.
+6. Verify: a second new thread is created (the previous thread with
+   messages is preserved in the history panel).
+7. Navigate to /chat with no query param.
+8. Verify: no new thread is created; last-used thread is resumed.
+```
+
+### Feature 2
+
+```
+1. Deploy Model A. Navigate to chat → new empty thread opens, model picker shows Model A.
+2. Send a message. Observe stats pill → shows "Model A" metrics.
+3. Deploy Model B. Navigate to chat → a second new empty thread opens, model picker shows Model B.
+4. Send a message. Observe stats pill → shows "Model B" metrics.
+5. In the history panel, switch back to Model A's thread.
+6. Verify: Model A's stats pill still shows "Model A" metrics — unchanged by Model B deployment or chat.
+7. Reload the page. Verify both threads and their metrics survive the reload correctly.
+```
+
+---
+
+## Key Files to Read Before Starting
+
+| File | Why |
+|------|-----|
+| `app/frontend/src/components/chatui/ChatComponent.tsx` | All chat state, `createNewConversation`, model selection, prop drilling |
+| `app/frontend/src/components/chatui/runInference.ts` | Where `InferenceStats` is assembled and attached to messages |
+| `app/frontend/src/components/chatui/types.ts` | `ChatMessage`, `InferenceStats` interfaces |
+| `app/frontend/src/components/chatui/InferenceStats.tsx` | `getDisplayModelName()` — the display-side of the bug |
+| `app/frontend/src/components/chatui/MessageActions.tsx` | Prop chain: `modelName` passed from `ChatHistory` to `InferenceStats` |
+| `app/frontend/src/components/chatui/usePersistentState.ts` | Thread persistence limits (20 threads, 100 messages) |
+
+---
+
+## Out of Scope
+
+- Multi-window or multi-tab synchronization of chat state.
+- Server-side chat history persistence (IndexedDB is the only store for now).
+- Changing the 20-thread or 100-message limits.
+- Any backend changes.
+- Metrics for non-LLM model types (Vision, Speech, Image Gen).
diff --git a/tt-inference-server b/tt-inference-server