Providing a way to fix the context-size with a self hosted ollama server #44980

LaurentCabaret · 2025-12-16T12:09:45Z

LaurentCabaret
Dec 16, 2025

Context

When using Zed Agent with Ollama, the effective context size appears to be negotiated automatically between Zed and the Ollama server.

Some Ollama models (e.g. Devstral Small 2 – 24B) advertise an extremely large maximum context window. In those cases, Zed seems to select the maximum supported context rather than a reasonable or user-defined value.

Problem

When using Devstral Small 2 (24B) with Ollama, Zed requests a context size of approximately 393k tokens, which results in:

~80 GB VRAM usage
Severe performance degradation / near-freeze
Making the model practically unusable on most systems

This happens even when a smaller context size is explicitly configured.

Expected Behavior

Zed should:

Respect a user-defined maximum context limit, or
Fall back to a safe/default upper bound, rather than always using the model’s maximum advertised context size.

Actual Behavior

Zed requests the model’s maximum context size, ignoring user attempts to limit it.

Attempts to Mitigate

1. Zed `settings.json`

"language_models": {
  "ollama": {
    "available_models": [
      {
        "name": "devstral-small-2:24b",
        "max_tokens": 98304,
        "max_context_tokens": 98304
      }
    ]
  }
}

➡️ max_context_tokens appears to be ignored.

2. Ollama Modelfile

FROM devstral-small-2:24b
PARAMETER num_ctx 98304

➡️ Works inconsistently; Zed may still negotiate a much larger context.

3. HTTP Proxy (Content-Length rewriting)

A proxy was used to artificially reduce the reported maximum context.
➡️ Works intermittently, not reliable or sustainable.

Question / Proposal

Would it be possible to:

Add a hard upper bound for context size on the Zed side, or
Expose a reliable configuration option to cap the maximum context requested from Ollama models?

This would allow predictable VRAM usage and prevent severe performance regressions with large-context models.

DoubleDensity · 2026-01-27T21:48:08Z

DoubleDensity
Jan 27, 2026

Has anyone found a way to mitigate this? It is frustrating because Zed + Devstral Small 2 works incredibly well for me up to around 50k context size, but beyond that it will randomly fail processing and 'just stop' in middle of activity. If I could constrain its limits it would be perfect.

0 replies

ygorpontelo · 2026-02-10T15:24:03Z

ygorpontelo
Feb 10, 2026

This just happened to me. I defined a context size of 128k for my model and zed just overrided to 200k, blowing the limit of my RAM. There could be a configuration to set the maximum limit, or if it is ollama just respect the specified context length. I'll use a smaller model in the mean time.

0 replies

ygorpontelo · 2026-02-10T19:33:08Z

ygorpontelo
Feb 10, 2026

Later today i found a way to provide the specifications i needed to the ollama model, found in this section of the documention:
https://zed.dev/docs/ai/llm-providers#ollama-context

Open the settings.json and put the language_models config there. I can verify the context is respected this way.

5 replies

therysin Feb 14, 2026

Later today i found a way to provide the specifications i needed to the ollama model, found in this section of the documention: https://zed.dev/docs/ai/llm-providers#ollama-context

Open the settings.json and put the language_models config there. I can verify the context is respected this way.

Can you guide on this? It seems like the Zed settings file doesn't even recognize "context_window" in the language_models key.

ldzgch Feb 14, 2026

for me, the "max_tokens" key works. Though I have restart the editor to apply the settings.

"language_models": {
  "ollama": {
    "api_url": "http://localhost:11434",
    "auto_discover": false,
    "available_models": [
      {
        "name": "ministral-3:14b",
        "display_name": "ministral-3:14b-local",
        "max_tokens": 32768,
        "supports_tools": true,
        "supports_images": true
      }
    ],

  }
}

sbe-arg · 2026-02-16T03:13:08Z

sbe-arg
Feb 16, 2026

#44506

0 replies

Uh oh!

Providing a way to fix the context-size with a self hosted ollama server #44980

Uh oh!

Context

Problem

Expected Behavior

Actual Behavior

Attempts to Mitigate

1. Zed settings.json

2. Ollama Modelfile

3. HTTP Proxy (Content-Length rewriting)

Question / Proposal

Replies: 4 comments · 5 replies

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

1. Zed `settings.json`

Replies: 4 comments 5 replies