-
Notifications
You must be signed in to change notification settings - Fork 10
create bountry for per-model chat sessions and metrics #673
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
rfatimaTT
wants to merge
1
commit into
anirud/tt-studio-face-lift
Choose a base branch
from
rumeza/model-chat-bounty
base: anirud/tt-studio-face-lift
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,175 @@ | ||
| # Bounty: Per-Model Chat Sessions with Isolated Metrics | ||
|
|
||
| ## Background | ||
|
|
||
| TT Studio's chat UI (`/chat`) supports multiple conversation threads stored in IndexedDB, | ||
| but two behaviors make it feel disconnected from the model management workflow: | ||
|
|
||
| 1. **No automatic new chat on model open** — clicking "Chat" on a deployed model from the | ||
| Models page navigates to `/chat` and resumes whatever conversation the user last had. There | ||
| is no automatic new thread started for the freshly-chosen model, so users land in an | ||
| unrelated conversation history. | ||
|
|
||
| 2. **Metrics display the wrong model name** — `modelName` is a single live state variable in | ||
| `ChatComponent` passed as a prop down to every rendered message. When the user switches | ||
| models, **all previously generated messages instantly re-label their stats panel with the | ||
| new model**, even though those messages were generated by the original model. The metric | ||
| numbers (TTFT, TPOT, tokens) are stored correctly per message in `ChatMessage.inferenceStats`, | ||
| but the model identity label is not — it is never persisted at inference time. | ||
|
|
||
| --- | ||
|
|
||
| ## Goal | ||
|
|
||
| 1. Every time a model is deployed and the user navigates to the chat page, it opens a | ||
| **new, blank conversation thread** scoped to that model automatically — no manual | ||
| "New Chat" click required. | ||
| 2. Each chat thread's metrics are fully isolated: chatting with Model A, then deploying | ||
| Model B (which opens its own new chat), then navigating back to Model A's chat must | ||
| leave Model A's metrics completely unchanged. | ||
|
|
||
| --- | ||
|
|
||
| ## What Success Looks Like | ||
|
|
||
| ### Feature 1 — Auto-Open New Chat on Model Launch | ||
|
|
||
| - [ ] The Models page "Chat" action navigates to `/chat?modelId=<container_id>` (or equivalent | ||
| route parameter) instead of bare `/chat`. | ||
| - [ ] When `ChatComponent` mounts with a `modelId` query parameter, it calls | ||
| `createNewConversation()` automatically and pre-selects that model in the model picker. | ||
| - [ ] The new thread appears at the top of `HistoryPanel`, titled with the model display name | ||
| (e.g. "Chat with Llama-3.3-70B") or the default "New Chat N" pattern. | ||
| - [ ] Navigating to `/chat` without a query param keeps the existing behavior (resume last | ||
| thread, no forced new conversation). | ||
| - [ ] Pressing "Chat" twice for the same model does not stack duplicate empty threads — if the | ||
| current thread is already empty and for the same model, reuse it. | ||
|
|
||
| ### Feature 2 — Per-Message Model Identity in Metrics | ||
|
|
||
| - [ ] `ChatMessage.inferenceStats` (or a sibling field `ChatMessage.modelName`) stores the | ||
| model display name at the moment inference completes inside `runInference.ts`. | ||
| - [ ] `InferenceStats.tsx` prefers the stored model name from the message over the live | ||
| `modelName` prop when rendering historical messages. | ||
| - [ ] Switching from Model A to Model B does **not** change the model label shown in stats | ||
| panels for messages that were generated by Model A. | ||
| - [ ] Newly generated messages correctly display the model that produced them. | ||
| - [ ] The fix is backward-compatible: older persisted messages that pre-date this change and | ||
| lack a stored model name gracefully fall back to displaying no model label (or a generic | ||
| "Unknown model") rather than crashing. | ||
|
|
||
| --- | ||
|
|
||
| ## Architecture | ||
|
|
||
| ### Changes Required | ||
|
|
||
| #### Backend — none required for either feature | ||
|
|
||
| #### Frontend | ||
|
|
||
| | File | Change | | ||
| |------|--------| | ||
| | `app/frontend/src/components/chatui/types.ts` | Add optional `modelName?: string` to `InferenceStats` (or to `ChatMessage`) | | ||
| | `app/frontend/src/components/chatui/runInference.ts` | Pass `modelName` into `runInference` and include it in the `InferenceStats` object before attaching to the message | | ||
| | `app/frontend/src/components/chatui/InferenceStats.tsx` | Prefer `stats.modelName` over the `modelName` prop; fall back gracefully if absent | | ||
| | `app/frontend/src/components/chatui/ChatComponent.tsx` | Read `?modelId` query param on mount; call `createNewConversation()` + pre-select model when param is present; skip if current thread is already empty for that model | | ||
| | `app/frontend/src/components/models/row-cells/ManageCell.tsx` (or equivalent) | Update "Chat" navigation link to include `?modelId=<container_id>` | | ||
|
|
||
| --- | ||
|
|
||
| ## Build Phases | ||
|
|
||
| ### Phase 1 — Store model name per message (Bug 2) | ||
|
|
||
| 1. Add `modelName?: string` to `InferenceStats` in `types.ts`. | ||
| 2. Thread `modelName` as a parameter into `runInference()`. | ||
| 3. In `runInference.ts`, include `modelName` when building the final `InferenceStats` | ||
| object (alongside TTFT, TPOT, etc.) before attaching it to the message. | ||
| 4. In `InferenceStats.tsx`, update `getDisplayModelName()` to check `stats.modelName` | ||
| first; fall back to the `modelName` prop for backward compat. | ||
| 5. Verify: generate two messages with Model A, switch to Model B, generate a third — all | ||
| three stats panels should show the correct model. | ||
|
|
||
| ### Phase 2 — Auto-open new chat on model launch (Feature 1) | ||
|
|
||
| 6. Update the "Chat" navigation action in the models table to append `?modelId=<container_id>`. | ||
| 7. In `ChatComponent`, read `modelId` from `useSearchParams()` (React Router) on mount. | ||
| 8. When `modelId` is present: look up the model display name from the deployed models list, | ||
| call `createNewConversation()`, then call `setModelID` / `setModelName` with the resolved | ||
| model — all in a single `useEffect` on first mount. | ||
| 9. Add the duplicate-guard: if `chatThreads[currentThreadIndex].messages.length === 0` and | ||
| the current model already matches, skip creating a new thread. | ||
| 10. Remove the `?modelId` param from the URL after consumption so that a browser refresh | ||
| does not re-trigger the effect (use `history.replace`). | ||
|
|
||
| --- | ||
|
|
||
| ## Complexity Traps to Avoid | ||
|
|
||
| - **Don't change `InferenceStats` in a breaking way** — older persisted messages in IndexedDB | ||
| will not have `modelName`; the display code must handle `undefined` cleanly (no crash, no | ||
| blank white panel). | ||
| - **Don't call `createNewConversation` on every render** — gate the `useEffect` with a | ||
| `hasInitialized` ref so it fires only once per navigation to `/chat?modelId=`. | ||
| - **Model lookup timing** — deployed models may not be loaded when `ChatComponent` mounts; | ||
| `modelName` resolution should be async/retry-safe (wait for the model list to populate | ||
| before setting the thread title). | ||
| - **Thread limit** — `usePersistentState` enforces a max of 20 threads; the auto-create | ||
| logic should respect this (existing threads get pruned as per current FIFO logic). | ||
| - **IndexedDB schema** — `inferenceStats` is serialised to IndexedDB as-is; adding a new | ||
| field to the interface is backward-compatible (existing records simply won't have it). | ||
|
|
||
| --- | ||
|
|
||
| ## Validation | ||
|
|
||
| ### Feature 1 | ||
|
|
||
| ``` | ||
| 1. Deploy a model (e.g. Llama-3.3-70B). | ||
| 2. From the Models page, click "Chat". | ||
| 3. Verify: navigated to /chat, a brand-new empty thread is active, | ||
| the model picker shows Llama-3.3-70B. | ||
| 4. Type a message and send. | ||
| 5. Go back to Models page, click "Chat" on the same model again. | ||
| 6. Verify: a second new thread is created (the previous thread with | ||
| messages is preserved in the history panel). | ||
| 7. Navigate to /chat with no query param. | ||
| 8. Verify: no new thread is created; last-used thread is resumed. | ||
| ``` | ||
|
|
||
| ### Feature 2 | ||
|
|
||
| ``` | ||
| 1. Deploy Model A. Navigate to chat → new empty thread opens, model picker shows Model A. | ||
| 2. Send a message. Observe stats pill → shows "Model A" metrics. | ||
| 3. Deploy Model B. Navigate to chat → a second new empty thread opens, model picker shows Model B. | ||
| 4. Send a message. Observe stats pill → shows "Model B" metrics. | ||
| 5. In the history panel, switch back to Model A's thread. | ||
| 6. Verify: Model A's stats pill still shows "Model A" metrics — unchanged by Model B deployment or chat. | ||
| 7. Reload the page. Verify both threads and their metrics survive the reload correctly. | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## Key Files to Read Before Starting | ||
|
|
||
| | File | Why | | ||
| |------|-----| | ||
| | `app/frontend/src/components/chatui/ChatComponent.tsx` | All chat state, `createNewConversation`, model selection, prop drilling | | ||
| | `app/frontend/src/components/chatui/runInference.ts` | Where `InferenceStats` is assembled and attached to messages | | ||
| | `app/frontend/src/components/chatui/types.ts` | `ChatMessage`, `InferenceStats` interfaces | | ||
| | `app/frontend/src/components/chatui/InferenceStats.tsx` | `getDisplayModelName()` — the display-side of the bug | | ||
| | `app/frontend/src/components/chatui/MessageActions.tsx` | Prop chain: `modelName` passed from `ChatHistory` to `InferenceStats` | | ||
| | `app/frontend/src/components/chatui/usePersistentState.ts` | Thread persistence limits (20 threads, 100 messages) | | ||
|
|
||
| --- | ||
|
|
||
| ## Out of Scope | ||
|
|
||
| - Multi-window or multi-tab synchronization of chat state. | ||
| - Server-side chat history persistence (IndexedDB is the only store for now). | ||
| - Changing the 20-thread or 100-message limits. | ||
| - Any backend changes. | ||
| - Metrics for non-LLM model types (Vision, Speech, Image Gen). |
Submodule tt-inference-server
updated
from ac1892 to 692248
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure why we had this committed ?
maybe my facelift branch is off main?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry... not sure what you mean :/
this branch is for the bounty and I pointed its head to ur face lift branch. Is there a different place you wanted this to be contributed. thanks for clarifying in advance