-
Notifications
You must be signed in to change notification settings - Fork 3.5k
fix: ollama performance #12526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix: ollama performance #12526
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request addresses Ollama performance issues related to reasoning/thinking functionality by:
- Patching ollama-ai-provider-v2 to support reasoning effort levels ('low', 'medium', 'high') in addition to boolean values for the
thinkparameter, as the current dependency version only accepts boolean values - Adding middleware to reorder reasoning and text chunks in streaming responses, ensuring reasoning content appears before response text for better UX
- Updating provider options to properly handle the new reasoning effort values
Changes:
- Modified
buildOllamaProviderOptionsto use the newOllamaProviderOptionstype and set thethinkparameter based on reasoning effort for gpt-oss models - Created new middleware
ollamaReasoningOrderMiddlewareto buffer and reorder stream chunks so reasoning appears before text - Updated patch for ollama-ai-provider-v2@1.5.5 to backport support for string reasoning effort values and export
OllamaProviderOptions
Reviewed changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/renderer/src/aiCore/utils/options.ts | Changed import from OllamaCompletionProviderOptions to OllamaProviderOptions and updated logic for setting the think parameter based on reasoning effort |
| src/renderer/src/aiCore/middleware/ollamaReasoningOrderMiddleware.ts | New middleware that buffers text chunks and reorders them to appear after reasoning chunks in streaming responses |
| src/renderer/src/aiCore/middleware/AiSdkMiddlewareBuilder.ts | Integrated the new Ollama reasoning order middleware for cases when reasoning is enabled |
| patches/ollama-ai-provider-v2@1.5.5.patch | Patched the ollama-ai-provider-v2 library to support 'low', 'medium', 'high' values for the think parameter and export OllamaProviderOptions |
| pnpm-lock.yaml | Updated patch hash for ollama-ai-provider-v2 to reflect the new patch content |
| package.json | Updated patch file path for ollama-ai-provider-v2 to use the new naming convention |
Files not reviewed (1)
- pnpm-lock.yaml: Language not supported
Comments suppressed due to low confidence (1)
patches/ollama-ai-provider-v2@1.5.5.patch:102
- The patch moves temperature and top_p into the options object without checking if they are undefined. If these values are undefined, they will be explicitly set as undefined in the options object, which could cause issues with the Ollama API. Consider using conditional spreading like the original code did, or add a check to only include these values when they are defined.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } | ||
| } else { | ||
| providerOptions.think = !['none', undefined].includes(reasoningEffort) | ||
| providerOptions.think = reasoningEffort !== 'none' |
Copilot
AI
Jan 20, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition reasoningEffort !== 'none' will evaluate to true when reasoningEffort is undefined, causing providerOptions.think to be set to true even when reasoning effort is not explicitly configured. This should explicitly check for valid values to avoid unintended behavior. Consider using a condition like reasoningEffort && reasoningEffort !== 'none' or checking against specific valid values.
| providerOptions.think = reasoningEffort !== 'none' | |
| providerOptions.think = !!reasoningEffort && reasoningEffort !== 'none' |
| if (isReasoningPart(chunk)) { | ||
| hasReasoning = true | ||
| controller.enqueue(chunk) | ||
| flushBufferedText(controller) | ||
| return |
Copilot
AI
Jan 20, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The buffering logic has an issue: when text chunks are buffered and then reasoning chunks arrive, the buffered text is flushed immediately after the first reasoning chunk is enqueued. This causes the text to appear between reasoning chunks rather than after all reasoning chunks. For example, if the order is text-start, text-delta, reasoning-start, reasoning-delta, reasoning-end, the output will be reasoning-start, text-start, text-delta, reasoning-delta, reasoning-end. The flush should happen after all reasoning chunks, not after the first one. Consider moving the flushBufferedText call to only execute after reasoning-end or tracking when reasoning has fully completed.
| new TransformStream<LanguageModelV2StreamPart, LanguageModelV2StreamPart>({ | ||
| transform(chunk, controller) { | ||
| if (isReasoningPart(chunk)) { | ||
| hasReasoning = true | ||
| controller.enqueue(chunk) | ||
| flushBufferedText(controller) | ||
| return | ||
| } | ||
|
|
||
| if (!hasReasoning && isTextPart(chunk)) { | ||
| bufferedText.push(chunk) | ||
| return | ||
| } | ||
|
|
||
| if (!hasReasoning && (chunk.type === 'finish' || chunk.type === 'error')) { | ||
| flushBufferedText(controller) | ||
| } | ||
|
|
||
| controller.enqueue(chunk) | ||
| }, | ||
| flush(controller) { | ||
| flushBufferedText(controller) | ||
| } | ||
| }) |
Copilot
AI
Jan 20, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Superfluous argument passed to function TransformStream.
|
maybe there are still issues that need to be modified further |
|
Note This issue/comment/review was translated by Claude. I feel the middleware wasn't modified correctly here. The second issue should modify the timing of reasoning-end. Please refer to: vercel/ai#11751 Original Content感觉中间件这里没有改对,第二个问题应该修改reasoning-end的结束时机,参考一下:vercel/ai#11751 |
|
i tried making some adjustments, still need help to check if there are any omissions in the patch modifications. These two issues do indeed exist. |
|
Note This issue/comment/review was translated by Claude. Can I modify the patch directly? Original Content直接修改patch可以吗 |
|
Note This issue/comment/review was translated by Claude. I'm not sure. First of all, the patch makes too many changes. Second, if I only modify the place below: It causes the thinking block to only display one character. It seems like the output is being processed repeatedly. Original Content不太清楚 首先就是patch进行改动的地方太多了 其次如果只是在下面这个地方修改 会导致思考块只显示一个字 好像是因为对输出重复处理了 |
|
CI Failed |
|
Note This issue/comment/review was translated by Claude. @kangfenmao The occasional bug caused by race conditions has now passed. I feel that the Original Content@kangfenmao 竞态导致的偶发bug现在已经pass了 |
Note
This issue/comment/review was translated by Claude.
ollama-ai-provider-v2 itself only accepts boolean values for think, causing the passed "thinking depth/length" settings to be filtered out during validation. This makes parseProviderOptions' zod validation discard 'low' | 'medium' | 'high', so the thinking depth cannot take effect. The latest version has been changed to boolean | 'low' | 'medium' | 'high' and exports ollamaProviderOptions, but the current project has too deep dependency levels to upgrade (3.0.1), so a patch must be used to backport it.
When ollama-ai-provider-v2 returns streaming data, some models will output text chunks first and then reasoning chunks when reasoning is enabled. Therefore, ollamaReasoningOrderMiddleware is added and injected when ollama reasoning is enabled; during streaming, it will buffer text first, and when reasoning chunks appear, place reasoning in front to ensure that thinking blocks are displayed before the main text.
Original Content
ollama-ai-provider-v2 本身对 think 只接受布尔值,导致传入的 "思考深度/长度" 设置被校验过滤掉。这会让 parseProviderOptions 的 zod 校验丢弃 'low' | 'medium' | 'high',于是思考深度无法生效。最新版已改为 boolean | 'low' | 'medium' | 'high' 并导出ollamaProviderOptions,但当前项目依赖层级过深无法升级(3.0.1),所以只能用补丁回补。
ollama-ai-provider-v2 本身流式返回时,部分模型在开启推理时会先输出文本chunk、后输出 reasoning chunk
所以新增 ollamaReasoningOrderMiddleware,并在 ollama 开启推理时注入;流式时会先缓冲文本,等到 reasoning chunk 出现后把 reasoning 放前面,保证思考块先于正文显示。