🤦 Gotcha: Spent Hours Debugging… Turns Out It Was the Context Limit #2203
Closed
nehadhirmiz
started this conversation in
General
Replies: 1 comment 3 replies
-
|
Ah yeah we don't ship model profiles for Ollama yet, which would have set the limit to match the model card (which is 128k tokens -- why are you limiting to 32k?) Are you setting this up in the config file? |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I ran into a subtle but critical issue with local agents using Ollama-based LLMs that might help others.
A simple query, Write a Python script that outputs "Hello World", could not be executed by multiple local models. I tried quite a few LLMs ranging from 1B–9B (with and without thinking enabled). Not a single model could complete this simple task.
When I asked the agent to provide a list of tools it had access to, it hallucinated tools and schemas that didn't exist. The issue was not with the LLMs; it was the default context window of Ollama, set to 4K.
As a seasoned ML developer, I feel embarrassed sharing that this small detail slipped my mind. I just wanted to share this so someone else doesn't spend hours trying to figure out why their local agents can't even perform the simplest task.
The lesson: it’s always good to have a simple check. I don’t think this will be the last time someone runs into this issue.
It may be worth having a checking mechanism that runs a few simple tests on local models and generates a performance report—some sort of test run with metrics captured.
If the agent can’t even complete simple tasks, that’s an immediate signal that something fundamental is wrong.
Context window 4K (default Ollama setting)

Context window 32K

Beta Was this translation helpful? Give feedback.
All reactions