Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
175 changes: 175 additions & 0 deletions docs/bounties/per-model-chat-session-and-metrics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,175 @@
# Bounty: Per-Model Chat Sessions with Isolated Metrics

## Background

TT Studio's chat UI (`/chat`) supports multiple conversation threads stored in IndexedDB,
but two behaviors make it feel disconnected from the model management workflow:

1. **No automatic new chat on model open** — clicking "Chat" on a deployed model from the
Models page navigates to `/chat` and resumes whatever conversation the user last had. There
is no automatic new thread started for the freshly-chosen model, so users land in an
unrelated conversation history.

2. **Metrics display the wrong model name** — `modelName` is a single live state variable in
`ChatComponent` passed as a prop down to every rendered message. When the user switches
models, **all previously generated messages instantly re-label their stats panel with the
new model**, even though those messages were generated by the original model. The metric
numbers (TTFT, TPOT, tokens) are stored correctly per message in `ChatMessage.inferenceStats`,
but the model identity label is not — it is never persisted at inference time.

---

## Goal

1. Every time a model is deployed and the user navigates to the chat page, it opens a
**new, blank conversation thread** scoped to that model automatically — no manual
"New Chat" click required.
2. Each chat thread's metrics are fully isolated: chatting with Model A, then deploying
Model B (which opens its own new chat), then navigating back to Model A's chat must
leave Model A's metrics completely unchanged.

---

## What Success Looks Like

### Feature 1 — Auto-Open New Chat on Model Launch

- [ ] The Models page "Chat" action navigates to `/chat?modelId=<container_id>` (or equivalent
route parameter) instead of bare `/chat`.
- [ ] When `ChatComponent` mounts with a `modelId` query parameter, it calls
`createNewConversation()` automatically and pre-selects that model in the model picker.
- [ ] The new thread appears at the top of `HistoryPanel`, titled with the model display name
(e.g. "Chat with Llama-3.3-70B") or the default "New Chat N" pattern.
- [ ] Navigating to `/chat` without a query param keeps the existing behavior (resume last
thread, no forced new conversation).
- [ ] Pressing "Chat" twice for the same model does not stack duplicate empty threads — if the
current thread is already empty and for the same model, reuse it.

### Feature 2 — Per-Message Model Identity in Metrics

- [ ] `ChatMessage.inferenceStats` (or a sibling field `ChatMessage.modelName`) stores the
model display name at the moment inference completes inside `runInference.ts`.
- [ ] `InferenceStats.tsx` prefers the stored model name from the message over the live
`modelName` prop when rendering historical messages.
- [ ] Switching from Model A to Model B does **not** change the model label shown in stats
panels for messages that were generated by Model A.
- [ ] Newly generated messages correctly display the model that produced them.
- [ ] The fix is backward-compatible: older persisted messages that pre-date this change and
lack a stored model name gracefully fall back to displaying no model label (or a generic
"Unknown model") rather than crashing.

---

## Architecture

### Changes Required

#### Backend — none required for either feature

#### Frontend

| File | Change |
|------|--------|
| `app/frontend/src/components/chatui/types.ts` | Add optional `modelName?: string` to `InferenceStats` (or to `ChatMessage`) |
| `app/frontend/src/components/chatui/runInference.ts` | Pass `modelName` into `runInference` and include it in the `InferenceStats` object before attaching to the message |
| `app/frontend/src/components/chatui/InferenceStats.tsx` | Prefer `stats.modelName` over the `modelName` prop; fall back gracefully if absent |
| `app/frontend/src/components/chatui/ChatComponent.tsx` | Read `?modelId` query param on mount; call `createNewConversation()` + pre-select model when param is present; skip if current thread is already empty for that model |
| `app/frontend/src/components/models/row-cells/ManageCell.tsx` (or equivalent) | Update "Chat" navigation link to include `?modelId=<container_id>` |

---

## Build Phases

### Phase 1 — Store model name per message (Bug 2)

1. Add `modelName?: string` to `InferenceStats` in `types.ts`.
2. Thread `modelName` as a parameter into `runInference()`.
3. In `runInference.ts`, include `modelName` when building the final `InferenceStats`
object (alongside TTFT, TPOT, etc.) before attaching it to the message.
4. In `InferenceStats.tsx`, update `getDisplayModelName()` to check `stats.modelName`
first; fall back to the `modelName` prop for backward compat.
5. Verify: generate two messages with Model A, switch to Model B, generate a third — all
three stats panels should show the correct model.

### Phase 2 — Auto-open new chat on model launch (Feature 1)

6. Update the "Chat" navigation action in the models table to append `?modelId=<container_id>`.
7. In `ChatComponent`, read `modelId` from `useSearchParams()` (React Router) on mount.
8. When `modelId` is present: look up the model display name from the deployed models list,
call `createNewConversation()`, then call `setModelID` / `setModelName` with the resolved
model — all in a single `useEffect` on first mount.
9. Add the duplicate-guard: if `chatThreads[currentThreadIndex].messages.length === 0` and
the current model already matches, skip creating a new thread.
10. Remove the `?modelId` param from the URL after consumption so that a browser refresh
does not re-trigger the effect (use `history.replace`).

---

## Complexity Traps to Avoid

- **Don't change `InferenceStats` in a breaking way** — older persisted messages in IndexedDB
will not have `modelName`; the display code must handle `undefined` cleanly (no crash, no
blank white panel).
- **Don't call `createNewConversation` on every render** — gate the `useEffect` with a
`hasInitialized` ref so it fires only once per navigation to `/chat?modelId=`.
- **Model lookup timing** — deployed models may not be loaded when `ChatComponent` mounts;
`modelName` resolution should be async/retry-safe (wait for the model list to populate
before setting the thread title).
- **Thread limit** — `usePersistentState` enforces a max of 20 threads; the auto-create
logic should respect this (existing threads get pruned as per current FIFO logic).
- **IndexedDB schema** — `inferenceStats` is serialised to IndexedDB as-is; adding a new
field to the interface is backward-compatible (existing records simply won't have it).

---

## Validation

### Feature 1

```
1. Deploy a model (e.g. Llama-3.3-70B).
2. From the Models page, click "Chat".
3. Verify: navigated to /chat, a brand-new empty thread is active,
the model picker shows Llama-3.3-70B.
4. Type a message and send.
5. Go back to Models page, click "Chat" on the same model again.
6. Verify: a second new thread is created (the previous thread with
messages is preserved in the history panel).
7. Navigate to /chat with no query param.
8. Verify: no new thread is created; last-used thread is resumed.
```

### Feature 2

```
1. Deploy Model A. Navigate to chat → new empty thread opens, model picker shows Model A.
2. Send a message. Observe stats pill → shows "Model A" metrics.
3. Deploy Model B. Navigate to chat → a second new empty thread opens, model picker shows Model B.
4. Send a message. Observe stats pill → shows "Model B" metrics.
5. In the history panel, switch back to Model A's thread.
6. Verify: Model A's stats pill still shows "Model A" metrics — unchanged by Model B deployment or chat.
7. Reload the page. Verify both threads and their metrics survive the reload correctly.
```

---

## Key Files to Read Before Starting

| File | Why |
|------|-----|
| `app/frontend/src/components/chatui/ChatComponent.tsx` | All chat state, `createNewConversation`, model selection, prop drilling |
| `app/frontend/src/components/chatui/runInference.ts` | Where `InferenceStats` is assembled and attached to messages |
| `app/frontend/src/components/chatui/types.ts` | `ChatMessage`, `InferenceStats` interfaces |
| `app/frontend/src/components/chatui/InferenceStats.tsx` | `getDisplayModelName()` — the display-side of the bug |
| `app/frontend/src/components/chatui/MessageActions.tsx` | Prop chain: `modelName` passed from `ChatHistory` to `InferenceStats` |
| `app/frontend/src/components/chatui/usePersistentState.ts` | Thread persistence limits (20 threads, 100 messages) |

---

## Out of Scope

- Multi-window or multi-tab synchronization of chat state.
- Server-side chat history persistence (IndexedDB is the only store for now).
- Changing the 20-thread or 100-message limits.
- Any backend changes.
- Metrics for non-LLM model types (Vision, Speech, Image Gen).
2 changes: 1 addition & 1 deletion tt-inference-server
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why we had this committed ?
maybe my facelift branch is off main?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry... not sure what you mean :/

this branch is for the bounty and I pointed its head to ur face lift branch. Is there a different place you wanted this to be contributed. thanks for clarifying in advance

Submodule tt-inference-server updated from ac1892 to 692248