diff --git a/docs/blog/v3.12-gpt-oss.md b/docs/blog/v3.12-gpt-oss.md index 9a09fa13..772f5dcc 100644 --- a/docs/blog/v3.12-gpt-oss.md +++ b/docs/blog/v3.12-gpt-oss.md @@ -46,6 +46,30 @@ npx -y node-llama-cpp inspect estimate ::: +## `MXFP4` Quantization +You might be used to looking for a `Q4_K_M` quantization because of its good balance between quality and size, +and be looking for a `Q4_K_M` quantization of `gpt-oss` models. +You don't have to, because these models are already natively provided in a similar quantization format called `MXFP4`. + +Let's break down what `MXFP4` is: +* `MXFP4` stands for Microscaling FP4 (Floating Point, 4-bit). `Q4_K_M` is also a 4-bit quantization. +* It's a format what was created and standardized by the Open Compute Project (OCP) in early 2024. + OCP is backed by big players like OpenAI, NVIDIA, AMD, Microsoft, and Meta, + with the goal of lowering the hardware and compute barriers to running AI models. +* Designed to dramatically reduce the memory and compute requirements for training and running AI models, + while preserving as much precision as possible. + +This format was used to train the `gpt-oss` models, so the most precise format of these models is `MXFP4`. +
+Since this is a 4-bit precision format, its size footprint is similar to `Q4_K_M` quantization, +but it provides better precision and thus better quality. +First class support for `MXFP4` in `llama.cpp` was introduced as part of the `gpt-oss` release. + +The bottom line is that you don't have to find a `Q4_K_M` quantization of `gpt-oss` models, +because the `MXFP4` format is as small, efficient, and fast as `Q4_K_M`, +but offers better precision and thus better quality. + + ### Try It Using the CLI To quickly try out [`gpt-oss-20b`](https://huggingface.co/giladgd/gpt-oss-20b-GGUF), you can use the [CLI `chat` command](../cli/chat.md): @@ -54,6 +78,42 @@ npx -y node-llama-cpp chat --ef --prompt "Hi there" hf:giladgd/gpt-oss-20b-GGUF/ ``` +## `thought` Segments +Since `gpt-oss` models are reasoning models, they generate thoughts as part of their response. +These thoughts are useful for debugging and understanding the model's reasoning process, +and can be used to iterate on the system prompt and inputs you provide to the model to improve its responses. + +However, OpenAI [emphasizes](https://openai.com/index/chain-of-thought-monitoring/#:~:text=leaving%20CoTs%20unrestricted%20may%20make%20them%20unfit%20to%20be%20shown%20to%20end%2Dusers%2C%20as%20they%20might%20violate%20some%20misuse%20policies) +that the thoughts generated by these models may not be safe to show to end users as they are unrestricted +and might include sensitive information, uncontained language, hallucinations, or other issues. +Thus, OpenAI recommends not showing these to users without further filtering, moderation or summarization. + +Check out the [segment streaming example](../guide/chat-session.md#stream-response-segments) to learn how to use segments. + + +## `comment` Segments +`gpt-oss` models output "preamble" messages in their response; +these are segmented as a new `comment` segment in the model's response. + +The model might choose to generate those segments to inform the user about the functions it's about to call. +For example, when it plans to use multiple functions, it may generate a plan in advance. + +These are intended for the user to see, but not as part of the main response. + +Check out the [segment streaming example](../guide/chat-session.md#stream-response-segments) to learn how to use segments. + +::: info Experiment with `comment` segments +The [Electron app template](../guide/electron.md) has been updated to properly segment comments in the response. + +Try it out by downloading the latest build [from GitHub](https://github.com/withcatai/node-llama-cpp/releases/latest), +or by [scaffolding a new project](../guide/index.md#scaffold-new-project) based on the Electron template: + +```shell +npm create node-llama-cpp@latest +``` +::: + + ## Customizing gpt-oss You can adjust `gpt-oss`'s responses by configuring the options of [`HarmonyChatWrapper`](../api/classes/HarmonyChatWrapper.md): ```typescript diff --git a/docs/guide/chat-session.md b/docs/guide/chat-session.md index a6a1a097..6ddb9152 100644 --- a/docs/guide/chat-session.md +++ b/docs/guide/chat-session.md @@ -833,7 +833,8 @@ console.log("AI: " + a1); ## Stream Response Segments {#stream-response-segments} The raw model response is automatically segmented into different types of segments. -The main response is not segmented, but other kinds of sections, like thoughts (chain of thought), are segmented. +The main response is not segmented, but other kinds of sections, +like thoughts (chain of thought) and comments (on relevant models, like [`gpt-oss`](../blog/v3.12-gpt-oss.md#comment-segments)), are segmented. To stream response segments you can use the [`onResponseChunk`](../api/type-aliases/LLamaChatPromptOptions.md#onresponsechunk) option. @@ -862,6 +863,8 @@ const a1 = await session.promptWithMeta(q1, { onResponseChunk(chunk) { const isThoughtSegment = chunk.type === "segment" && chunk.segmentType === "thought"; + const isCommentSegment = chunk.type === "segment" && + chunk.segmentType === "comment"; if (chunk.type === "segment" && chunk.segmentStartTime != null) process.stdout.write(` [segment start: ${chunk.segmentType}] `); @@ -879,6 +882,7 @@ const fullResponse = a1.response return item; else if (item.type === "segment") { const isThoughtSegment = item.segmentType === "thought"; + const isCommentSegment = item.segmentType === "comment"; let res = ""; if (item.startTime != null) diff --git a/src/chatWrappers/HarmonyChatWrapper.ts b/src/chatWrappers/HarmonyChatWrapper.ts index 3dfadf9a..d64795b0 100644 --- a/src/chatWrappers/HarmonyChatWrapper.ts +++ b/src/chatWrappers/HarmonyChatWrapper.ts @@ -1,7 +1,6 @@ import {ChatWrapper, ChatWrapperJinjaMatchConfiguration} from "../ChatWrapper.js"; import { - ChatModelFunctions, ChatModelResponse, ChatWrapperGenerateContextStateOptions, ChatWrapperGeneratedContextState, - ChatWrapperGeneratedPrefixTriggersContextState, ChatWrapperSettings + ChatModelFunctions, ChatModelResponse, ChatWrapperGenerateContextStateOptions, ChatWrapperGeneratedContextState, ChatWrapperSettings } from "../types.js"; import {SpecialToken, LlamaText, SpecialTokensText} from "../utils/LlamaText.js"; import {ChatModelFunctionsDocumentationGenerator} from "./utils/ChatModelFunctionsDocumentationGenerator.js"; @@ -282,23 +281,21 @@ export class HarmonyChatWrapper extends ChatWrapper { ], inject: LlamaText(new SpecialTokensText("<|message|>")) }, - ...( - !hasFunctions ? [] : [{ - type: "functionCall", - triggers: [ - LlamaText(new SpecialTokensText("<|channel|>commentary to=")) - ], - replaceTrigger: true, - inject: LlamaText(new SpecialTokensText("<|channel|>commentary")) - }, { - type: "functionCall", - triggers: [ - LlamaText(new SpecialTokensText("<|channel|>analysis to=")) - ], - replaceTrigger: true, - inject: LlamaText(new SpecialTokensText("<|channel|>analysis")) - }] satisfies ChatWrapperGeneratedPrefixTriggersContextState["prefixTriggers"] - ) + { + type: "functionCall", + triggers: [ + LlamaText(new SpecialTokensText("<|channel|>commentary to=")) + ], + replaceTrigger: true, + inject: LlamaText(new SpecialTokensText("<|channel|>commentary")) + }, { + type: "functionCall", + triggers: [ + LlamaText(new SpecialTokensText("<|channel|>analysis to=")) + ], + replaceTrigger: true, + inject: LlamaText(new SpecialTokensText("<|channel|>analysis")) + } ], noPrefixTrigger: { type: "response", @@ -669,6 +666,18 @@ export class HarmonyChatWrapper extends ChatWrapper { {}, {additionalRenderParameters: jinjaParameters} ], + [ + { + _jinjaFlags: { + emptyLastModelResponseIsFinalMessage: true, + useSpecialTokensForFullSystemMessage: true, + useNonFinalFinalMessage: false, + noFinalMessages: false + } + }, + {}, + {additionalRenderParameters: jinjaParameters} + ], [ { _jinjaFlags: { diff --git a/src/chatWrappers/utils/isJinjaTemplateEquivalentToSpecializedChatWrapper.ts b/src/chatWrappers/utils/isJinjaTemplateEquivalentToSpecializedChatWrapper.ts index b9259597..20d9e35b 100644 --- a/src/chatWrappers/utils/isJinjaTemplateEquivalentToSpecializedChatWrapper.ts +++ b/src/chatWrappers/utils/isJinjaTemplateEquivalentToSpecializedChatWrapper.ts @@ -139,13 +139,22 @@ function checkEquivalence( if (!compareContextTexts(jinjaRes.contextText, specializedWrapperRes.contextText, tokenizer)) return false; + const specializedStopGenerationTriggers = [ + ...specializedWrapperRes.stopGenerationTriggers, + ...( + specializedWrapperRes.rerender?.triggers == null + ? [] + : specializedWrapperRes.rerender.triggers + ) + ]; + const jinjaHasAllSpecializedStopGenerationTriggers = jinjaRes.stopGenerationTriggers .every((trigger) => { return [trigger, trigger.trimEnd(), trigger.trimStart(), trigger.trimStart().trimEnd()].some((normalizedJinjaTrigger) => { if (normalizedJinjaTrigger.values.length === 0) return true; - const foundSimilarTriggers = specializedWrapperRes.stopGenerationTriggers.some((specializedTrigger) => ( + const foundSimilarTriggers = specializedStopGenerationTriggers.some((specializedTrigger) => ( normalizedJinjaTrigger.includes(specializedTrigger) )); @@ -158,7 +167,7 @@ function checkEquivalence( tokenizer ); - const foundSimilarOrShorterTokenizedTriggers = specializedWrapperRes.stopGenerationTriggers + const foundSimilarOrShorterTokenizedTriggers = specializedStopGenerationTriggers .some((specializedTrigger) => { const resolvedSpecializedTrigger = StopGenerationDetector.resolveLlamaTextTrigger( specializedTrigger, diff --git a/src/evaluator/LlamaChat/LlamaChat.ts b/src/evaluator/LlamaChat/LlamaChat.ts index ba81335b..faf17bca 100644 --- a/src/evaluator/LlamaChat/LlamaChat.ts +++ b/src/evaluator/LlamaChat/LlamaChat.ts @@ -313,8 +313,24 @@ export type LLamaChatGenerateResponseOptions> => { @@ -617,7 +635,6 @@ export class LlamaChat { ); }; const loadContextWindowForFunctionCallingLoop = async () => loadContextWindow(true); - const loadContextWindowForBudgetTriggers = async () => loadContextWindow(false); while (true) { generateResponseState.startTokenLoop(); @@ -638,6 +655,10 @@ export class LlamaChat { } } + const abortRes = generateResponseState.handleAbortTrigger("model"); + if (abortRes != null) + return abortRes; + if (shouldHandlePrefixTriggers) { const handlePrefixTriggersRes = await generateResponseState.handlePrefixTriggers( loadContextWindowForFunctionCallingLoop @@ -646,7 +667,7 @@ export class LlamaChat { return handlePrefixTriggersRes; } - if (generateResponseState.functionEvaluationMode !== false) { + if (generateResponseState.functionEvaluationMode !== false && !generateResponseState.abortOnNonText) { const functionsCallsRes = await generateResponseState.enterFunctionCallingLoop( loadContextWindowForFunctionCallingLoop ); @@ -696,9 +717,9 @@ export class LlamaChat { break; if (await generateResponseState.handleBudgetTriggers()) { - await loadContextWindowForBudgetTriggers(); - await generateResponseState.alignCurrentSequenceStateWithCurrentTokens(); - await generateResponseState.createNewEvaluationIterator(); + generateResponseState.shouldRerender = true; + generateResponseState.skipClosingResponseItemOnRerender = true; + break; } if (generateResponseState.handleShouldRerender() || generateResponseState.updateShouldContextShift()) @@ -1604,6 +1625,7 @@ class GenerateResponseState["maxParallelFunctionCalls"]; private readonly contextShift: LLamaChatGenerateResponseOptions["contextShift"]; private readonly customStopTriggers: LLamaChatGenerateResponseOptions["customStopTriggers"]; + public readonly abortOnNonText: boolean; private readonly minimumOverlapPercentageToPreventContextShift: Exclude["lastEvaluationContextWindow"], undefined>["minimumOverlapPercentageToPreventContextShift"], undefined>; public readonly functionsEnabled: boolean; @@ -1660,6 +1682,8 @@ class GenerateResponseState 0); @@ -1796,7 +1822,7 @@ class GenerateResponseState this.stopGenerationDetector.addStopTrigger(stopTrigger)); - if (this.functions != null && Object.keys(this.functions).length > 0) + if (this.functions != null && Object.keys(this.functions).length > 0 && !this.abortOnNonText) this.functionSyntaxStartDetector.addStopTrigger( StopGenerationDetector.resolveLlamaTextTrigger( LlamaText([ @@ -1829,6 +1855,29 @@ class GenerateResponseState= segmentBudget + ) + continue; const prefixDetector = new StopGenerationDetector(); StopGenerationDetector.resolveStopTriggers(trigger.triggers, this.llamaChat.model.tokenizer) @@ -2144,8 +2209,18 @@ class GenerateResponseState= noPrefixTriggerSegmentBudget + ) + this.noPrefixTrigger = undefined; this.rerenderTriggers = rerender?.triggers ?? []; this.rerenderTriggerDetector.clearInProgressStops(); @@ -2206,6 +2281,11 @@ class GenerateResponseState this.disengageInitiallyEngagedFunctionMode.addStopTrigger(stopTrigger)); @@ -2242,6 +2322,16 @@ class GenerateResponseState budget != null && budget !== Infinity; - - const hasBudgetTriggers = this.budgets != null && hasBudget(this.budgets.thoughtTokens); - if (!hasBudgetTriggers) + if (this.budgets == null) return shouldReloadEvaluationState; - if (hasBudget(this.budgets.thoughtTokens) && this.segmentHandler.isSegmentTypeOpen("thought")) { - const usedThoughtTokens = this.segmentHandler.getSegmentTokensCount("thought"); - if (usedThoughtTokens >= this.budgets.thoughtTokens) { - this.segmentHandler.closeSegment("thought"); + for (const segmentType of this.segmentHandler.getOpenSegmentStack().reverse()) { + const budget = this.getSegmentBudget(segmentType); + if (budget == null) + continue; + + const usedSegmentTokens = this.segmentHandler.getSegmentTokensCount(segmentType); + if (usedSegmentTokens >= budget) { + this.segmentHandler.closeSegment(segmentType); shouldReloadEvaluationState = true; } } @@ -3228,8 +3344,31 @@ class GenerateResponseState ( + (budget == null || budget === Infinity) + ? null + : budget + ); + + if (this.budgets == null) + return null; + + if (segmentType === "thought") + return getBudget(this.budgets.thoughtTokens); + else if (segmentType === "comment") + return getBudget(this.budgets.commentTokens); + + void (segmentType satisfies never); + return null; + } + public handleShouldRerender() { this.shouldRerender = this.rerenderTriggerDetector.hasTriggeredStops; + + if (this.abortOnNonText && this.shouldRerender) + this.shouldAbortBecauseOfNonText = true; + return this.shouldRerender; } @@ -3239,7 +3378,7 @@ class GenerateResponseState; } @@ -3452,6 +3593,27 @@ class SegmentHandler + * some text here + * + * some text here + * + * some text here + * + * ``` + * In that example, the top most segment is `segment1`, and the last open segment is `segment2` (which is the next one to close). + * So in that example, this function will return: + * ``` + * ["segment1", "segment2"] + * ``` + */ + public getOpenSegmentStack(): S[] { + return this._segmentsStack.slice(this._ownedSegmentsStackLength); + } + private _processTokens(tokens: Token[], text: string) { const queuedTokenRelease = this._streamRegulator.addChunk({ tokens, diff --git a/src/evaluator/LlamaChatSession/LlamaChatSession.ts b/src/evaluator/LlamaChatSession/LlamaChatSession.ts index 183b6729..e3049f7c 100644 --- a/src/evaluator/LlamaChatSession/LlamaChatSession.ts +++ b/src/evaluator/LlamaChatSession/LlamaChatSession.ts @@ -8,7 +8,7 @@ import {appendUserMessageToChatHistory} from "../../utils/appendUserMessageToCha import {LlamaContextSequence} from "../LlamaContext/LlamaContext.js"; import {LlamaGrammar} from "../LlamaGrammar.js"; import { - LlamaChat, LLamaChatContextShiftOptions, LlamaChatResponse, LlamaChatResponseFunctionCall, LlamaChatResponseChunk, + LlamaChat, LLamaChatContextShiftOptions, LlamaChatResponse, LlamaChatResponseChunk, LlamaChatResponseFunctionCall, LlamaChatResponseFunctionCallParamsChunk } from "../LlamaChat/LlamaChat.js"; import {EvaluationPriority} from "../LlamaContext/types.js"; @@ -16,6 +16,7 @@ import {TokenBias} from "../TokenBias.js"; import {LlamaText, LlamaTextJSON} from "../../utils/LlamaText.js"; import {wrapAbortSignal} from "../../utils/wrapAbortSignal.js"; import {safeEventCallback} from "../../utils/safeEventCallback.js"; +import {GgufArchitectureType} from "../../gguf/types/GgufMetadataTypes.js"; import { LLamaChatPromptCompletionEngineOptions, LlamaChatSessionPromptCompletionEngine } from "./utils/LlamaChatSessionPromptCompletionEngine.js"; @@ -220,7 +221,14 @@ export type LLamaChatPromptOptions { this._ensureNotDisposed(); @@ -862,63 +927,140 @@ export class LlamaChatSession { if (this._chat == null) throw new DisposedError(); - const {completion, lastEvaluation, metadata} = await this._chat.loadChatAndCompleteUserMessage( - asWithLastUserMessageRemoved(this._chatHistory), - { - initialUserPrompt: prompt, - functions, - documentFunctionParams, - grammar, - onTextChunk, - onToken, - signal: abortController.signal, - stopOnAbortSignal: true, - repeatPenalty, - minP, - topK, - topP, - seed, - tokenBias, - customStopTriggers, - maxTokens, - temperature, - trimWhitespaceSuffix, - contextShift: { - ...this._contextShift, - lastEvaluationMetadata: this._lastEvaluation?.contextShiftMetadata - }, - evaluationPriority, - lastEvaluationContextWindow: { - history: asWithLastUserMessageRemoved(this._lastEvaluation?.contextWindow), - minimumOverlapPercentageToPreventContextShift: 0.8 + if (shouldCompleteAsModel) { + const completeAsModelUserPrompt = (typeof completeAsModel == "boolean" || completeAsModel === "auto") + ? defaultCompleteAsModel.userPrompt + : completeAsModel?.userPrompt ?? defaultCompleteAsModel.userPrompt; + const completeAsModelMessagePrefix = (typeof completeAsModel == "boolean" || completeAsModel === "auto") + ? defaultCompleteAsModel.modelPrefix + : completeAsModel?.modelPrefix ?? defaultCompleteAsModel.modelPrefix; + + const {response, lastEvaluation, metadata} = await this._chat.generateResponse( + [ + ...asWithLastUserMessageRemoved(this._chatHistory), + {type: "user", text: completeAsModelUserPrompt}, + {type: "model", response: [completeAsModelMessagePrefix + prompt]} + ] as ChatHistoryItem[], + { + abortOnNonText: true, + functions, + documentFunctionParams, + grammar: grammar as undefined, // this is allowed only because `abortOnNonText` is enabled + onTextChunk, + onToken, + signal: abortController.signal, + stopOnAbortSignal: true, + repeatPenalty, + minP, + topK, + topP, + seed, + tokenBias, + customStopTriggers, + maxTokens, + temperature, + trimWhitespaceSuffix, + contextShift: { + ...this._contextShift, + lastEvaluationMetadata: this._lastEvaluation?.contextShiftMetadata + }, + evaluationPriority, + lastEvaluationContextWindow: { + history: this._lastEvaluation?.contextWindow == null + ? undefined + : [ + ...asWithLastUserMessageRemoved(this._lastEvaluation?.contextWindow), + {type: "user", text: completeAsModelUserPrompt}, + {type: "model", response: [completeAsModelMessagePrefix + prompt]} + ] as ChatHistoryItem[], + minimumOverlapPercentageToPreventContextShift: 0.8 + } } - } - ); - this._ensureNotDisposed(); + ); + this._ensureNotDisposed(); - this._lastEvaluation = { - cleanHistory: this._chatHistory, - contextWindow: asWithLastUserMessageRemoved(lastEvaluation.contextWindow), - contextShiftMetadata: lastEvaluation.contextShiftMetadata - }; - this._canUseContextWindowForCompletion = this._chatHistory.at(-1)?.type === "user"; + this._lastEvaluation = { + cleanHistory: this._chatHistory, + contextWindow: asWithLastUserMessageRemoved(asWithLastModelMessageRemoved(lastEvaluation.contextWindow)), + contextShiftMetadata: lastEvaluation.contextShiftMetadata + }; + this._canUseContextWindowForCompletion = this._chatHistory.at(-1)?.type === "user"; - if (!stopOnAbortSignal && metadata.stopReason === "abort" && abortController.signal?.aborted) - throw abortController.signal.reason; + if (!stopOnAbortSignal && metadata.stopReason === "abort" && abortController.signal?.aborted) + throw abortController.signal.reason; + + if (metadata.stopReason === "customStopTrigger") + return { + completion: response, + stopReason: metadata.stopReason, + customStopTrigger: metadata.customStopTrigger, + remainingGenerationAfterStop: metadata.remainingGenerationAfterStop + }; - if (metadata.stopReason === "customStopTrigger") return { - completion: completion, + completion: response, stopReason: metadata.stopReason, - customStopTrigger: metadata.customStopTrigger, remainingGenerationAfterStop: metadata.remainingGenerationAfterStop }; + } else { + const {completion, lastEvaluation, metadata} = await this._chat.loadChatAndCompleteUserMessage( + asWithLastUserMessageRemoved(this._chatHistory), + { + initialUserPrompt: prompt, + functions, + documentFunctionParams, + grammar, + onTextChunk, + onToken, + signal: abortController.signal, + stopOnAbortSignal: true, + repeatPenalty, + minP, + topK, + topP, + seed, + tokenBias, + customStopTriggers, + maxTokens, + temperature, + trimWhitespaceSuffix, + contextShift: { + ...this._contextShift, + lastEvaluationMetadata: this._lastEvaluation?.contextShiftMetadata + }, + evaluationPriority, + lastEvaluationContextWindow: { + history: asWithLastUserMessageRemoved(this._lastEvaluation?.contextWindow), + minimumOverlapPercentageToPreventContextShift: 0.8 + } + } + ); + this._ensureNotDisposed(); - return { - completion: completion, - stopReason: metadata.stopReason, - remainingGenerationAfterStop: metadata.remainingGenerationAfterStop - }; + this._lastEvaluation = { + cleanHistory: this._chatHistory, + contextWindow: asWithLastUserMessageRemoved(lastEvaluation.contextWindow), + contextShiftMetadata: lastEvaluation.contextShiftMetadata + }; + this._canUseContextWindowForCompletion = this._chatHistory.at(-1)?.type === "user"; + + if (!stopOnAbortSignal && metadata.stopReason === "abort" && abortController.signal?.aborted) + throw abortController.signal.reason; + + if (metadata.stopReason === "customStopTrigger") + return { + completion: completion, + stopReason: metadata.stopReason, + customStopTrigger: metadata.customStopTrigger, + remainingGenerationAfterStop: metadata.remainingGenerationAfterStop + }; + + return { + completion: completion, + stopReason: metadata.stopReason, + remainingGenerationAfterStop: metadata.remainingGenerationAfterStop + }; + } }); } finally { this._preloadAndCompleteAbortControllers.delete(abortController); @@ -1041,3 +1183,18 @@ function asWithLastUserMessageRemoved(chatHistory?: ChatHistoryItem[]) { return newChatHistory; } + + +function asWithLastModelMessageRemoved(chatHistory: ChatHistoryItem[]): ChatHistoryItem[]; +function asWithLastModelMessageRemoved(chatHistory: ChatHistoryItem[] | undefined): ChatHistoryItem[] | undefined; +function asWithLastModelMessageRemoved(chatHistory?: ChatHistoryItem[]) { + if (chatHistory == null) + return chatHistory; + + const newChatHistory = chatHistory.slice(); + + while (newChatHistory.at(-1)?.type === "model") + newChatHistory.pop(); + + return newChatHistory; +} diff --git a/src/evaluator/LlamaChatSession/utils/LlamaChatSessionPromptCompletionEngine.ts b/src/evaluator/LlamaChatSession/utils/LlamaChatSessionPromptCompletionEngine.ts index 3317ed21..c8191f19 100644 --- a/src/evaluator/LlamaChatSession/utils/LlamaChatSessionPromptCompletionEngine.ts +++ b/src/evaluator/LlamaChatSession/utils/LlamaChatSessionPromptCompletionEngine.ts @@ -33,7 +33,8 @@ export type LLamaChatPromptCompletionEngineOptions = { customStopTriggers?: LLamaChatCompletePromptOptions["customStopTriggers"], grammar?: LLamaChatCompletePromptOptions["grammar"], functions?: LLamaChatCompletePromptOptions["functions"], - documentFunctionParams?: LLamaChatCompletePromptOptions["documentFunctionParams"] + documentFunctionParams?: LLamaChatCompletePromptOptions["documentFunctionParams"], + completeAsModel?: LLamaChatCompletePromptOptions["completeAsModel"] }; export const defaultMaxPreloadTokens = (sequence: LlamaContextSequence) => { diff --git a/src/types.ts b/src/types.ts index 85940365..0d243434 100644 --- a/src/types.ts +++ b/src/types.ts @@ -183,6 +183,9 @@ export type ChatWrapperGeneratedPrefixTriggersContextState = { /** * Open a segment of the specified type. + * + * If the budget for this segment has exceeded, this trigger will be ignored, + * so ensure to have a fallback for a response. */ type: "segment", @@ -231,6 +234,8 @@ export type ChatWrapperGeneratedPrefixTriggersContextState = { } | { /** * Open a segment of the specified type. + * + * If the budget for this segment has exceeded, this action will be ignored. */ type: "segment", @@ -333,7 +338,9 @@ export type ChatModelFunctionCall = { startsNewChunk?: boolean }; -export const allSegmentTypes = ["thought"] as const satisfies ChatModelSegmentType[]; +export const allSegmentTypes = ["thought", "comment"] as const satisfies readonly ChatModelSegmentType[]; +void (null as Exclude satisfies never); + export type ChatModelSegmentType = "thought" | "comment"; export type ChatModelSegment = { type: "segment", diff --git a/templates/electron-typescript-react/electron/state/llmState.ts b/templates/electron-typescript-react/electron/state/llmState.ts index 060e89bb..21558b52 100644 --- a/templates/electron-typescript-react/electron/state/llmState.ts +++ b/templates/electron-typescript-react/electron/state/llmState.ts @@ -408,7 +408,8 @@ export const llmFunctions = { simplifiedChat: getSimplifiedChatHistory(false), draftPrompt: { ...llmState.state.chatSession.draftPrompt, - completion: chatSessionCompletionEngine?.complete(llmState.state.chatSession.draftPrompt.prompt) ?? "" + completion: + chatSessionCompletionEngine?.complete(llmState.state.chatSession.draftPrompt.prompt)?.trimStart() ?? "" } } }; @@ -428,6 +429,7 @@ export const llmFunctions = { autoDisposeSequence: false }); chatSessionCompletionEngine = chatSession.createPromptCompletionEngine({ + functions: modelFunctions, // these won't be called, but are used to avoid redundant context shifts onGeneration(prompt, completion) { if (llmState.state.chatSession.draftPrompt.prompt === prompt) { llmState.state = { @@ -436,7 +438,7 @@ export const llmFunctions = { ...llmState.state.chatSession, draftPrompt: { prompt, - completion + completion: completion.trimStart() } } }; @@ -454,7 +456,7 @@ export const llmFunctions = { simplifiedChat: [], draftPrompt: { prompt: llmState.state.chatSession.draftPrompt.prompt, - completion: chatSessionCompletionEngine.complete(llmState.state.chatSession.draftPrompt.prompt) ?? "" + completion: chatSessionCompletionEngine.complete(llmState.state.chatSession.draftPrompt.prompt)?.trimStart() ?? "" } } }; @@ -483,7 +485,7 @@ export const llmFunctions = { ...llmState.state.chatSession, draftPrompt: { prompt: prompt, - completion: chatSessionCompletionEngine.complete(prompt) ?? "" + completion: chatSessionCompletionEngine.complete(prompt)?.trimStart() ?? "" } } }; diff --git a/templates/electron-typescript-react/package.json b/templates/electron-typescript-react/package.json index bac000cd..42611d6a 100644 --- a/templates/electron-typescript-react/package.json +++ b/templates/electron-typescript-react/package.json @@ -13,6 +13,7 @@ "_postinstall": "npm run models:pull", "models:pull": "node-llama-cpp pull --dir ./models \"{{modelUriOrUrl|escape|escape}}\"", "start": "vite dev", + "start:inspect": "ENABLE_INSPECT=true vite dev", "start:build": "electron ./dist-electron", "prebuild": "rimraf ./dist ./dist-electron ./release", "build": "tsc && vite build && electron-builder --config ./electron-builder.ts", diff --git a/templates/electron-typescript-react/src/App/App.tsx b/templates/electron-typescript-react/src/App/App.tsx index cda01e11..8a70000c 100644 --- a/templates/electron-typescript-react/src/App/App.tsx +++ b/templates/electron-typescript-react/src/App/App.tsx @@ -173,10 +173,10 @@ export function App() {
-
Get Llama 3.1 8B
+
gpt-oss 20B
{ const isLastMessage = responseIndex === modelMessage.message.length - 1; - if (message.type === "segment" && message.segmentType === "thought") { - return ; + if (message.type === "segment") { + if (message.segmentType === "thought") + return ; + else if (message.segmentType === "comment") + return ; + else + // ensure we handle all segment types or TypeScript will complain + void (message.segmentType satisfies never); } return .message.model > .responseComment { + padding: 4px 16px 0px 16px; + position: relative; + margin: 4px -8px; + border-radius: 12px; + + transition: margin-bottom 0.3s var(--transition-easing), background-color 0.5s var(--transition-easing); + + &.active { + margin-bottom: 8px; + + > .header > .opener > .summary { + opacity: 1; + + > .title { + opacity: 0.6; + font-weight: bold; + --generating-animation-mask-transparency-color: rgb(0 0 0 / 48%); + + animation-play-state: running; + } + + > .chevron { + opacity: 0.48; + } + } + } + + &.open { + margin-bottom: 20px; + background-color: var(--model-comment-block-background-color); + + > .header > .opener > .summary > .chevron { + transform: rotate(90deg); + margin-inline-end: -2px; + } + } + + > .header { + display: flex; + flex-direction: row; + + > .opener { + border: none; + background-color: var(--model-comment-block-button-background-color); + display: flex; + flex-direction: column; + padding: 8px 12px; + margin: 0px 0px 0px -12px; + border-radius: 12px; + user-select: none; + outline: solid 2px transparent; + outline-offset: 4px; + align-self: flex-start; + max-width: 100%; + opacity: 0.64; + + transition: opacity 0.3s var(--transition-easing); + + &:focus-visible { + outline: solid 2px Highlight; + outline-offset: 0px; + } + + &:hover { + opacity: 0.82; + } + + > .summary { + display: flex; + flex-direction: row; + align-items: center; + + > .title { + --generating-animation-mask-transparency-color: rgb(0 0 0 / 100%); + transition: font-weight 0.3s var(--transition-easing), opacity 0.3s var(--transition-easing), --generating-animation-mask-transparency-color 0.3s var(--transition-easing), margin-bottom 0.3s var(--transition-easing); + mask: linear-gradient( + to right, + var(--generating-animation-mask-transparency-color) 34%, + black, + var(--generating-animation-mask-transparency-color) 66% + ) content-box 0 0 / 300% 100% no-repeat; + animation: generating-animation 2s infinite ease-in-out; + animation-play-state: paused; + white-space: nowrap; + } + + > .chevron { + flex-shrink: 0; + + width: 20px; + height: 20px; + margin: -4px; + margin-inline-start: 0px; + margin-inline-end: -6px; + opacity: 0.64; + + transform-origin: 56% 56%; + transition: transform 0.2s var(--transition-easing), margin-inline-end 0.2s var(--transition-easing), opacity 0.3s var(--transition-easing); + } + } + } + + > .excerpt { + white-space: nowrap; + overflow: hidden; + display: flex; + justify-content: end; + align-self: center; + width: calc-size(fit-content, min(360px, size + 8px)); + opacity: 0.24; + mask: linear-gradient(to right, transparent, black 64px); + margin-inline-start: 4px; + + user-select: none; + + interpolate-size: allow-keywords; + transition: margin-inline-start 0.2s var(--transition-easing), width 0.5s var(--transition-easing), opacity 0.3s 0.2s var(--transition-easing); + + &.hide { + width: calc-size(fit-content, max(0px, min(360px, size + 8px) - 24px)); + opacity: 0; + margin-inline-start: 0px; /* this is to offset the chevron right margin on open */ + transition-delay: 0s, 0s; + } + } + } + + > .comment { + margin-top: 16px; + padding-bottom: 12px; + display: flex; + flex-direction: column; + interpolate-size: allow-keywords; + transition: height 0.5s var(--transition-easing), margin-top 0.5s var(--transition-easing), padding-bottom 0.5s var(--transition-easing), margin-bottom 0.5s var(--transition-easing), opacity 0.3s 0.2s var(--transition-easing); + + &.hide { + margin-top: 32px; + height: 0px; + margin-bottom: -12px; + padding-bottom: 0px; + opacity: 0; + transition-delay: 0s, 0s, 0s, 0s; + } + + > .content { + opacity: 0.64; + justify-self: flex-start; + position: relative; + overflow: hidden; + max-height: 100%; + } + } +} + +@keyframes generating-animation { + 0% { + mask-position: 100% 100%; + } + + 100% { + mask-position: 0 100%; + } +} + +@property --generating-animation-mask-transparency-color { + syntax: ""; + inherits: false; + initial-value: transparent; +} diff --git a/templates/electron-typescript-react/src/App/components/ChatHistory/components/ModelResponseComment/ModelResponseComment.tsx b/templates/electron-typescript-react/src/App/components/ChatHistory/components/ModelResponseComment/ModelResponseComment.tsx new file mode 100644 index 00000000..1ca2d073 --- /dev/null +++ b/templates/electron-typescript-react/src/App/components/ChatHistory/components/ModelResponseComment/ModelResponseComment.tsx @@ -0,0 +1,51 @@ +import classNames from "classnames"; +import {useCallback, useMemo, useState} from "react"; +import {MessageMarkdown} from "../../../MessageMarkdown/MessageMarkdown.js"; +import {RightChevronIconSVG} from "../../../../../icons/RightChevronIconSVG.js"; +import {MarkdownContent} from "../../../MarkdownContent/MarkdownContent.js"; + +import "./ModelResponseComment.css"; + +const excerptLength = 1024; + +export function ModelResponseComment({text, active}: ModelResponseCommentProps) { + const [isOpen, setIsOpen] = useState(false); + + const toggleIsOpen = useCallback(() => { + setIsOpen((isOpen) => !isOpen); + }, []); + + const title = useMemo(() => { + if (active) + return "Generating comment"; + + return "Generated comment"; + }, [active]); + + return
+
+ + + {text.slice(-excerptLength)} + +
+
+ {text} +
+
; +} + +type ModelResponseCommentProps = { + text: string, + active: boolean +}; diff --git a/templates/electron-typescript-react/src/App/components/ChatHistory/components/ModelResponseThought/ModelResponseThought.css b/templates/electron-typescript-react/src/App/components/ChatHistory/components/ModelResponseThought/ModelResponseThought.css index f7d196a2..127e1216 100644 --- a/templates/electron-typescript-react/src/App/components/ChatHistory/components/ModelResponseThought/ModelResponseThought.css +++ b/templates/electron-typescript-react/src/App/components/ChatHistory/components/ModelResponseThought/ModelResponseThought.css @@ -13,7 +13,7 @@ > .title { opacity: 0.6; font-weight: bold; - --animation-mask-transparency-color: rgb(0 0 0 / 48%); + --thinking-animation-mask-transparency-color: rgb(0 0 0 / 48%); animation-play-state: running; } @@ -65,13 +65,13 @@ transition: opacity 0.3s var(--transition-easing); > .title { - --animation-mask-transparency-color: rgb(0 0 0 / 100%); - transition: font-weight 0.3s var(--transition-easing), opacity 0.3s var(--transition-easing), --animation-mask-transparency-color 0.3s var(--transition-easing), margin-bottom 0.3s var(--transition-easing); + --thinking-animation-mask-transparency-color: rgb(0 0 0 / 100%); + transition: font-weight 0.3s var(--transition-easing), opacity 0.3s var(--transition-easing), --thinking-animation-mask-transparency-color 0.3s var(--transition-easing), margin-bottom 0.3s var(--transition-easing); mask: linear-gradient( to right, - var(--animation-mask-transparency-color) 34%, + var(--thinking-animation-mask-transparency-color) 34%, black, - var(--animation-mask-transparency-color) 66% + var(--thinking-animation-mask-transparency-color) 66% ) content-box 0 0 / 300% 100% no-repeat; animation: thinking-animation 2s infinite ease-in-out; animation-play-state: paused; @@ -98,7 +98,7 @@ justify-content: end; justify-self: flex-start; mask: linear-gradient(to right, transparent, black 48px); - max-width: 360px; + max-width: calc-size(fit-content, min(360px, size + 8px)); opacity: 0.24; font-size: 14px; margin-top: 2px; @@ -156,7 +156,7 @@ } } -@property --animation-mask-transparency-color { +@property --thinking-animation-mask-transparency-color { syntax: ""; inherits: false; initial-value: transparent; diff --git a/templates/electron-typescript-react/src/App/components/MarkdownContent/MarkdownContent.tsx b/templates/electron-typescript-react/src/App/components/MarkdownContent/MarkdownContent.tsx index 95e0e815..d0bf4a02 100644 --- a/templates/electron-typescript-react/src/App/components/MarkdownContent/MarkdownContent.tsx +++ b/templates/electron-typescript-react/src/App/components/MarkdownContent/MarkdownContent.tsx @@ -26,7 +26,7 @@ export function MarkdownContent({children, inline = false, dir, className}: Mark return; if (inline) - divRef.current.innerHTML = md.renderInline(children ?? ""); + divRef.current.innerHTML = md.renderInline(children ?? "").replaceAll("
", ""); else divRef.current.innerHTML = md.render(children ?? ""); }, [inline, children]); diff --git a/templates/electron-typescript-react/src/index.css b/templates/electron-typescript-react/src/index.css index 1b80a733..8b01d098 100644 --- a/templates/electron-typescript-react/src/index.css +++ b/templates/electron-typescript-react/src/index.css @@ -55,6 +55,9 @@ --message-hr-color: light-dark(rgba(0 0 0 / 16%), rgba(255 255 255 / 16%)); --message-blockquote-border-color: light-dark(rgba(0 0 0 / 8%), rgba(255 255 255 / 8%)); + + --model-comment-block-background-color: light-dark(rgba(0 0 0 / 6%), rgba(0 0 0 / 24%)); + --model-comment-block-button-background-color: light-dark(rgba(0 0 0 / 6%), rgba(0 0 0 / 24%)); } body { diff --git a/templates/electron-typescript-react/vite.config.ts b/templates/electron-typescript-react/vite.config.ts index 9d6ac218..55ecce8b 100644 --- a/templates/electron-typescript-react/vite.config.ts +++ b/templates/electron-typescript-react/vite.config.ts @@ -34,6 +34,12 @@ export default defineConfig({ main: { // Shortcut of `build.lib.entry`. entry: path.join(__dirname, "electron/index.ts"), + onstart({startup}) { + if (process.env["ENABLE_INSPECT"] === "true") + return startup([".", "--inspect"]); + + return startup(["."]); + }, vite: { build: { target: "es2022", diff --git a/test/standalone/chatWrappers/utils/jinjaTemplates.ts b/test/standalone/chatWrappers/utils/jinjaTemplates.ts index a896d9b7..e533f9ee 100644 --- a/test/standalone/chatWrappers/utils/jinjaTemplates.ts +++ b/test/standalone/chatWrappers/utils/jinjaTemplates.ts @@ -952,3 +952,352 @@ export const harmonyJinjaTemplate3 = ` <|start|>assistant {%- endif -%} `.slice(1, -1); + + +export const harmonyJinjaTemplate4 = ` +{# Chat template fixes by Unsloth #} +{#- + In addition to the normal inputs of \`messages\` and \`tools\`, this template also accepts the + following kwargs: + - "builtin_tools": A list, can contain "browser" and/or "python". + - "model_identity": A string that optionally describes the model identity. + - "reasoning_effort": A string that describes the reasoning effort, defaults to "medium". + #} + +{#- Tool Definition Rendering ============================================== #} +{%- macro render_typescript_type(param_spec, required_params, is_nullable=false) -%} + {%- if param_spec.type == "array" -%} + {%- if param_spec['items'] -%} + {%- if param_spec['items']['type'] == "string" -%} + {{- "string[]" }} + {%- elif param_spec['items']['type'] == "number" -%} + {{- "number[]" }} + {%- elif param_spec['items']['type'] == "integer" -%} + {{- "number[]" }} + {%- elif param_spec['items']['type'] == "boolean" -%} + {{- "boolean[]" }} + {%- else -%} + {%- set inner_type = render_typescript_type(param_spec['items'], required_params) -%} + {%- if inner_type == "object | object" or inner_type|length > 50 -%} + {{- "any[]" }} + {%- else -%} + {{- inner_type + "[]" }} + {%- endif -%} + {%- endif -%} + {%- if param_spec.nullable -%} + {{- " | null" }} + {%- endif -%} + {%- else -%} + {{- "any[]" }} + {%- if param_spec.nullable -%} + {{- " | null" }} + {%- endif -%} + {%- endif -%} + {%- elif param_spec.type is defined and param_spec.type is iterable and param_spec.type is not string and param_spec.type is not mapping and param_spec.type[0] is defined -%} + {#- Handle array of types like ["object", "object"] from Union[dict, list] #} + {%- if param_spec.type | length > 1 -%} + {{- param_spec.type | join(" | ") }} + {%- else -%} + {{- param_spec.type[0] }} + {%- endif -%} + {%- elif param_spec.oneOf -%} + {#- Handle oneOf schemas - check for complex unions and fallback to any #} + {%- set has_object_variants = false -%} + {%- for variant in param_spec.oneOf -%} + {%- if variant.type == "object" -%} + {%- set has_object_variants = true -%} + {%- endif -%} + {%- endfor -%} + {%- if has_object_variants and param_spec.oneOf|length > 1 -%} + {{- "any" }} + {%- else -%} + {%- for variant in param_spec.oneOf -%} + {{- render_typescript_type(variant, required_params) -}} + {%- if variant.description %} + {{- "// " + variant.description }} + {%- endif -%} + {%- if variant.default is defined %} + {{ "// default: " + variant.default|tojson }} + {%- endif -%} + {%- if not loop.last %} + {{- " | " }} + {% endif -%} + {%- endfor -%} + {%- endif -%} + {%- elif param_spec.type == "string" -%} + {%- if param_spec.enum -%} + {{- '"' + param_spec.enum|join('" | "') + '"' -}} + {%- else -%} + {{- "string" }} + {%- if param_spec.nullable %} + {{- " | null" }} + {%- endif -%} + {%- endif -%} + {%- elif param_spec.type == "number" -%} + {{- "number" }} + {%- elif param_spec.type == "integer" -%} + {{- "number" }} + {%- elif param_spec.type == "boolean" -%} + {{- "boolean" }} + + {%- elif param_spec.type == "object" -%} + {%- if param_spec.properties -%} + {{- "{\\n" }} + {%- for prop_name, prop_spec in param_spec.properties.items() -%} + {{- prop_name -}} + {%- if prop_name not in (param_spec.required or []) -%} + {{- "?" }} + {%- endif -%} + {{- ": " }} + {{ render_typescript_type(prop_spec, param_spec.required or []) }} + {%- if not loop.last -%} + {{-", " }} + {%- endif -%} + {%- endfor -%} + {{- "}" }} + {%- else -%} + {{- "object" }} + {%- endif -%} + {%- else -%} + {{- "any" }} + {%- endif -%} +{%- endmacro -%} + +{%- macro render_tool_namespace(namespace_name, tools) -%} + {{- "## " + namespace_name + "\\n\\n" }} + {{- "namespace " + namespace_name + " {\\n\\n" }} + {%- for tool in tools %} + {%- set tool = tool.function %} + {{- "// " + tool.description + "\\n" }} + {{- "type "+ tool.name + " = " }} + {%- if tool.parameters and tool.parameters.properties %} + {{- "(_: {\\n" }} + {%- for param_name, param_spec in tool.parameters.properties.items() %} + {%- if param_spec.description %} + {{- "// " + param_spec.description + "\\n" }} + {%- endif %} + {{- param_name }} + {%- if param_name not in (tool.parameters.required or []) -%} + {{- "?" }} + {%- endif -%} + {{- ": " }} + {{- render_typescript_type(param_spec, tool.parameters.required or []) }} + {%- if param_spec.default is defined -%} + {%- if param_spec.enum %} + {{- ", // default: " + param_spec.default }} + {%- elif param_spec.oneOf %} + {{- "// default: " + param_spec.default }} + {%- else %} + {{- ", // default: " + param_spec.default|tojson }} + {%- endif -%} + {%- endif -%} + {%- if not loop.last %} + {{- ",\\n" }} + {%- else %} + {{- ",\\n" }} + {%- endif -%} + {%- endfor %} + {{- "}) => any;\\n\\n" }} + {%- else -%} + {{- "() => any;\\n\\n" }} + {%- endif -%} + {%- endfor %} + {{- "} // namespace " + namespace_name }} +{%- endmacro -%} + +{%- macro render_builtin_tools(browser_tool, python_tool) -%} + {%- if browser_tool %} + {{- "## browser\\n\\n" }} + {{- "// Tool for browsing.\\n" }} + {{- "// The \`cursor\` appears in brackets before each browsing display: \`[{cursor}]\`.\\n" }} + {{- "// Cite information from the tool using the following format:\\n" }} + {{- "// \`【{cursor}†L{line_start}(-L{line_end})?】\`, for example: \`【6†L9-L11】\` or \`【8†L3】\`.\\n" }} + {{- "// Do not quote more than 10 words directly from the tool output.\\n" }} + {{- "// sources=web (default: web)\\n" }} + {{- "namespace browser {\\n\\n" }} + {{- "// Searches for information related to \`query\` and displays \`topn\` results.\\n" }} + {{- "type search = (_: {\\n" }} + {{- "query: string,\\n" }} + {{- "topn?: number, // default: 10\\n" }} + {{- "source?: string,\\n" }} + {{- "}) => any;\\n\\n" }} + {{- "// Opens the link \`id\` from the page indicated by \`cursor\` starting at line number \`loc\`, showing \`num_lines\` lines.\\n" }} + {{- "// Valid link ids are displayed with the formatting: \`【{id}†.*】\`.\\n" }} + {{- "// If \`cursor\` is not provided, the most recent page is implied.\\n" }} + {{- "// If \`id\` is a string, it is treated as a fully qualified URL associated with \`source\`.\\n" }} + {{- "// If \`loc\` is not provided, the viewport will be positioned at the beginning of the document or centered on the most relevant passage, if available.\\n" }} + {{- "// Use this function without \`id\` to scroll to a new location of an opened page.\\n" }} + {{- "type open = (_: {\\n" }} + {{- "id?: number | string, // default: -1\\n" }} + {{- "cursor?: number, // default: -1\\n" }} + {{- "loc?: number, // default: -1\\n" }} + {{- "num_lines?: number, // default: -1\\n" }} + {{- "view_source?: boolean, // default: false\\n" }} + {{- "source?: string,\\n" }} + {{- "}) => any;\\n\\n" }} + {{- "// Finds exact matches of \`pattern\` in the current page, or the page given by \`cursor\`.\\n" }} + {{- "type find = (_: {\\n" }} + {{- "pattern: string,\\n" }} + {{- "cursor?: number, // default: -1\\n" }} + {{- "}) => any;\\n\\n" }} + {{- "} // namespace browser\\n\\n" }} + {%- endif -%} + + {%- if python_tool %} + {{- "## python\\n\\n" }} + {{- "Use this tool to execute Python code in your chain of thought. The code will not be shown to the user. This tool should be used for internal reasoning, but not for code that is intended to be visible to the user (e.g. when creating plots, tables, or files).\\n\\n" }} + {{- "When you send a message containing Python code to python, it will be executed in a stateful Jupyter notebook environment. python will respond with the output of the execution or time out after 120.0 seconds. The drive at '/mnt/data' can be used to save and persist user files. Internet access for this session is UNKNOWN. Depends on the cluster.\\n\\n" }} + {%- endif -%} +{%- endmacro -%} + +{#- System Message Construction ============================================ #} +{%- macro build_system_message() -%} + {%- if model_identity is not defined %} + {%- set model_identity = "You are ChatGPT, a large language model trained by OpenAI." %} + {%- endif %} + {{- model_identity + "\\n" }} + {{- "Knowledge cutoff: 2024-06\\n" }} + {{- "Current date: " + strftime_now("%Y-%m-%d") + "\\n\\n" }} + {%- if reasoning_effort is not defined %} + {%- set reasoning_effort = "medium" %} + {%- endif %} + {{- "Reasoning: " + reasoning_effort + "\\n\\n" }} + {%- if builtin_tools is defined and builtin_tools is not none %} + {{- "# Tools\\n\\n" }} + {%- set available_builtin_tools = namespace(browser=false, python=false) %} + {%- for tool in builtin_tools %} + {%- if tool == "browser" %} + {%- set available_builtin_tools.browser = true %} + {%- elif tool == "python" %} + {%- set available_builtin_tools.python = true %} + {%- endif %} + {%- endfor %} + {{- render_builtin_tools(available_builtin_tools.browser, available_builtin_tools.python) }} + {%- endif -%} + {{- "# Valid channels: analysis, commentary, final. Channel must be included for every message." }} + {%- if tools -%} + {{- "\\nCalls to these tools must go to the commentary channel: 'functions'." }} + {%- endif -%} +{%- endmacro -%} + +{#- Main Template Logic ================================================= #} +{#- Set defaults #} + +{#- Render system message #} +{{- "<|start|>system<|message|>" }} +{{- build_system_message() }} +{{- "<|end|>" }} + +{#- Extract developer message #} +{%- if developer_instructions is defined and developer_instructions is not none %} + {%- set developer_message = developer_instructions %} + {%- set loop_messages = messages %} +{%- elif messages[0].role == "developer" or messages[0].role == "system" %} + {%- set developer_message = messages[0].content %} + {%- set loop_messages = messages[1:] %} +{%- else %} + {%- set developer_message = "" %} + {%- set loop_messages = messages %} +{%- endif %} + +{#- Render developer message #} +{%- if developer_message or tools %} + {{- "<|start|>developer<|message|>" }} + {%- if developer_message %} + {{- "# Instructions\\n\\n" }} + {{- developer_message }} + {%- endif %} + {%- if tools -%} + {%- if developer_message %} + {{- "\\n\\n" }} + {%- endif %} + {{- "# Tools\\n\\n" }} + {{- render_tool_namespace("functions", tools) }} + {%- endif -%} + {{- "<|end|>" }} +{%- endif %} + +{#- Render messages #} +{%- set last_tool_call = namespace(name=none) %} +{%- for message in loop_messages -%} + {#- At this point only assistant/user/tool messages should remain #} + {%- if message.role == 'assistant' -%} + {#- Checks to ensure the messages are being passed in the format we expect #} + {%- if "thinking" in message %} + {%- if "<|channel|>analysis<|message|>" in message.thinking or "<|channel|>final<|message|>" in message.thinking %} + {{- raise_exception("You have passed a message containing <|channel|> tags in the thinking field. Instead of doing this, you should pass analysis messages (the string between '<|message|>' and '<|end|>') in the 'thinking' field, and final messages (the string between '<|message|>' and '<|end|>') in the 'content' field.") }} + {%- endif %} + {%- endif %} + {%- if "tool_calls" in message %} + {#- We need very careful handling here - we want to drop the tool call analysis message if the model #} + {#- has output a later <|final|> message, but otherwise we want to retain it. This is the only case #} + {#- when we render CoT/analysis messages in inference. #} + {%- set future_final_message = namespace(found=false) %} + {%- for future_message in loop_messages[loop.index:] %} + {%- if future_message.role == 'assistant' and "tool_calls" not in future_message %} + {%- set future_final_message.found = true %} + {%- endif %} + {%- endfor %} + {#- We assume max 1 tool call per message, and so we infer the tool call name #} + {#- in "tool" messages from the most recent assistant tool call name #} + {%- set tool_call = message.tool_calls[0] %} + {%- if tool_call.function %} + {%- set tool_call = tool_call.function %} + {%- endif %} + {%- if message.content and message.thinking %} + {{- raise_exception("Cannot pass both content and thinking in an assistant message with tool calls! Put the analysis message in one or the other, but not both.") }} + {%- elif message.content and not future_final_message.found %} + {{- "<|start|>assistant<|channel|>analysis<|message|>" + message.content + "<|end|>" }} + {%- elif message.thinking and not future_final_message.found %} + {{- "<|start|>assistant<|channel|>analysis<|message|>" + message.thinking + "<|end|>" }} + {%- endif %} + {{- "<|start|>assistant to=" }} + {{- "functions." + tool_call.name + "<|channel|>commentary " }} + {{- (tool_call.content_type if tool_call.content_type is defined else "json") + "<|message|>" }} + {%- if tool_call.arguments is string %} + {{- tool_call.arguments }} + {%- else %} + {{- tool_call.arguments|tojson }} + {%- endif %} + {{- "<|call|>" }} + {%- set last_tool_call.name = tool_call.name %} + {%- elif loop.last and not add_generation_prompt %} + {#- Only render the CoT if the final turn is an assistant turn and add_generation_prompt is false #} + {#- This is a situation that should only occur in training, never in inference. #} + {%- if "thinking" in message %} + {{- "<|start|>assistant<|channel|>analysis<|message|>" + message.thinking + "<|end|>" }} + {%- endif %} + {#- <|return|> indicates the end of generation, but <|end|> does not #} + {#- <|return|> should never be an input to the model, but we include it as the final token #} + {#- when training, so the model learns to emit it. #} + {{- "<|start|>assistant<|channel|>final<|message|>" + message.content + "<|end|>" }} + {%- elif "thinking" in message %} + {#- CoT is dropped during all previous turns, so we never render it for inference #} + {{- "<|start|>assistant<|channel|>analysis<|message|>" + message.content + "<|end|>" }} + {%- set last_tool_call.name = none %} + {%- else %} + {#- CoT is dropped during all previous turns, so we never render it for inference #} + {{- "<|start|>assistant<|channel|>final<|message|>" + message.content + "<|end|>" }} + {%- set last_tool_call.name = none %} + {%- endif %} + {%- elif message.role == 'tool' -%} + {%- if last_tool_call.name is none %} + {{- raise_exception("Message has tool role, but there was no previous assistant message with a tool call!") }} + {%- endif %} + {{- "<|start|>functions." + last_tool_call.name }} + {%- if message.content is string %} + {{- " to=assistant<|channel|>commentary<|message|>" + message.content + "<|end|>" }} + {%- else %} + {{- " to=assistant<|channel|>commentary<|message|>" + message.content|tojson + "<|end|>" }} + {%- endif %} + {%- elif message.role == 'user' -%} + {{- "<|start|>user<|message|>" + message.content + "<|end|>" }} + {%- endif -%} +{%- endfor -%} + +{#- Generation prompt #} +{%- if add_generation_prompt -%} +<|start|>assistant +{%- endif -%} +{# Copyright 2025-present Unsloth. Apache 2.0 License. Unsloth chat template fixes. Edited from ggml-org & OpenAI #} +`.slice(1, -1); diff --git a/test/standalone/chatWrappers/utils/resolveChatWrapper.test.ts b/test/standalone/chatWrappers/utils/resolveChatWrapper.test.ts index 26de4640..745eb4a4 100644 --- a/test/standalone/chatWrappers/utils/resolveChatWrapper.test.ts +++ b/test/standalone/chatWrappers/utils/resolveChatWrapper.test.ts @@ -3,7 +3,7 @@ import { AlpacaChatWrapper, ChatMLChatWrapper, DeepSeekChatWrapper, FalconChatWrapper, FunctionaryChatWrapper, GemmaChatWrapper, GeneralChatWrapper, Llama2ChatWrapper, Llama3_1ChatWrapper, MistralChatWrapper, QwenChatWrapper, resolveChatWrapper, HarmonyChatWrapper } from "../../../../src/index.js"; -import {harmonyJinjaTemplate, harmonyJinjaTemplate2, harmonyJinjaTemplate3} from "./jinjaTemplates.js"; +import {harmonyJinjaTemplate, harmonyJinjaTemplate2, harmonyJinjaTemplate3, harmonyJinjaTemplate4} from "./jinjaTemplates.js"; const alpacaJinjaTemplate = ` @@ -755,4 +755,16 @@ describe("resolveChatWrapper", () => { }); expect(chatWrapper).to.be.instanceof(HarmonyChatWrapper); }); + + test("should resolve to specialized HarmonyChatWrapper 4", {timeout: 1000 * 60 * 60 * 2}, async () => { + const chatWrapper = resolveChatWrapper({ + customWrapperSettings: { + jinjaTemplate: { + template: harmonyJinjaTemplate4 + } + }, + fallbackToOtherWrappersOnJinjaError: false + }); + expect(chatWrapper).to.be.instanceof(HarmonyChatWrapper); + }); });