-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Remove the forced override of the context limit for Ollama API #2060 #2170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… can force the Ollama server to perform an unexpected reload. This ensures the new native-ollama behavior is the same as the previous behavior. A followup PR will be added to allow this to be overridden in the UI.
🦋 Changeset detectedLatest commit: d0efa75 The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
This change causes Ollama to truncate prompts at 4096 tokens, which breaks Kilo Code completely. |
So I dont think it does, really. Many Ollama models default to 4k (qwen3-0.6b my test example), even if they support more. So if I run with this patch against a default config model like qwen3-0.6b, I get the same output because its configured for 4k: But if I create a model file that pushes that up (or the model natively supports a longer context): It handles the request just fine. So I dont think this changes breaks Kilo with Ollama, it just prevents Kilo from operating with models that have unacceptably small defaults. FWIW, we'll also inherit this from Roo anyways: 7454 |
The issue is the defaults, and Ollama's behavior when we specify a different value.
I've just pushed a new commit, that solves this a little better. When the handler is initialized, we interrogate the model info a little better, and use that to estimate if the completion request is going to fit. If not, we throw an error rather than setting num_ctx and forcing a model reload. Take a look and let me know your thoughts. |
|
Your latest commit works if you manually set |
OK. I think its a tough spot
I'll defer to the Kilo team on your preference. |
|
We would love to work on a solution together, it is clear the community is not happy with the current Ollama performance. But there does need to be a solution to the prompt truncation problem, because that is the first thing a new user will see and there is no clear error message in that case.
How about implementing this proposal? I tried to implement it before (#1975), but had issues getting the value to sync properly on change (probably nothing insurmountable).
That is a bot-generated PR and no guarantee on quality. |
There is. With my updated commit, anything that exceeds the reported or expected limit of the model will have an explicit error thrown that the context is too long. But I'll leave this to the Kilo team to solve in a way that works for you. |
…his is what is uses in practice.
I tested your latest commit, but couldn't get the error to show up for a vanilla model. I added a commit that I think fixes it.
We don't use Ollama regularly so your input is very valuable. Please let me know what you think! |
| ## Preventing prompt truncation | ||
|
|
||
| By default Ollama truncates prompts to a very short length. | ||
| If you run into this problem, please see this FAQ item to resolve it: | ||
| [How can I specify the context window size?](https://github.com/ollama/ollama/blob/4383a3ab7a075eff78b31f7dc84c747e2fcd22b8/docs/faq.md#how-can-i-specify-the-context-window-size) | ||
|
|
||
| If you decide to use the `OLLAMA_CONTEXT_LENGTH` environment variable, it needs to be visible to both the IDE and the Ollama server. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the real change in this file, the rest is forced autformat.
|
Nice, I like the use of the ENV var. |
|
Thanks for your contribution @mcowger |
Remove the forced override of the context limit for Ollama API Kilo-Org#2060

Context
Setting this forcibly it can can force the Ollama server to perform an unexpected reload if its configured context is different from our built-in defaults. This ensures the new native-ollama behavior is the same as the previous behavior.
A followup PR will be added to allow this to be overridden in the UI.
Fixes: #2060
Thanks to jebba7151 for finding the root cause.
Implementation
remove
num_ctx: modelInfo.contextWindowfromclient.chat()call.Screenshots
NA
How to Test
Get in Touch
mcowger on Discord.