-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Description
Which version of the app are you using?
latest head
Which API Provider are you using?
Ollama
Which Model are you using?
All of them
What happened?
Thanks for an excellent product. I love that it integrates with so many providers, but the ollama integration is bugged due to an incredibly short sighted design decision of the ollama team that I don't think you're aware of.
time=2025-03-04T02:30:04.818-07:00 level=WARN source=runner.go:129 msg="truncating input prompt" limit=2048 prompt=16101 keep=4 new=2048
Ollama by default truncates input to 2048 tokens this is really quite tiny.
In order to override this all you need to do is to set the limit in the message options to something reasonable. There is no reason not to use the model's context length - new for this.
"options": {
"num_ctx": 128000
}
This can of course cause resource issues, so best practice is to measure what you current have, plus the size of the expected generation and then cap it at the model's max context size. This way you don't waste a bunch of resources when you only need a much smaller context at the moment.
Source: https://github.com/ollama/ollama/blob/main/docs/faq.md
Steps to reproduce
Relevant API REQUEST output
Additional context
No response