Providing a way to fix the context-size with a self hosted ollama server #44980
Replies: 4 comments 5 replies
-
|
Has anyone found a way to mitigate this? It is frustrating because Zed + Devstral Small 2 works incredibly well for me up to around 50k context size, but beyond that it will randomly fail processing and 'just stop' in middle of activity. If I could constrain its limits it would be perfect. |
Beta Was this translation helpful? Give feedback.
-
|
This just happened to me. I defined a context size of 128k for my model and zed just overrided to 200k, blowing the limit of my RAM. There could be a configuration to set the maximum limit, or if it is ollama just respect the specified context length. I'll use a smaller model in the mean time. |
Beta Was this translation helpful? Give feedback.
-
|
Later today i found a way to provide the specifications i needed to the ollama model, found in this section of the documention: Open the settings.json and put the language_models config there. I can verify the context is respected this way. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Context
When using Zed Agent with Ollama, the effective context size appears to be negotiated automatically between Zed and the Ollama server.
Some Ollama models (e.g. Devstral Small 2 – 24B) advertise an extremely large maximum context window. In those cases, Zed seems to select the maximum supported context rather than a reasonable or user-defined value.
Problem
When using Devstral Small 2 (24B) with Ollama, Zed requests a context size of approximately 393k tokens, which results in:
This happens even when a smaller context size is explicitly configured.
Expected Behavior
Zed should:
Actual Behavior
Zed requests the model’s maximum context size, ignoring user attempts to limit it.
Attempts to Mitigate
1. Zed
settings.json➡️
max_context_tokensappears to be ignored.2. Ollama Modelfile
➡️ Works inconsistently; Zed may still negotiate a much larger context.
3. HTTP Proxy (Content-Length rewriting)
A proxy was used to artificially reduce the reported maximum context.
➡️ Works intermittently, not reliable or sustainable.
Question / Proposal
Would it be possible to:
This would allow predictable VRAM usage and prevent severe performance regressions with large-context models.
Beta Was this translation helpful? Give feedback.
All reactions