Skip to content

Commit f18841a

Browse files
committed
Use 4096 when no num_ctx is set in Ollama model parameters, because this is what is uses in practice.
1 parent 8ab7772 commit f18841a

File tree

3 files changed

+78
-66
lines changed

3 files changed

+78
-66
lines changed

apps/kilocode-docs/docs/providers/ollama.md

Lines changed: 57 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -19,28 +19,25 @@ More trial and error will be required to find the right prompt.
1919
Local LLMs are usually also not very fast.
2020
Using simple prompts, keeping conversations short and disabling MCP tools can result in a speed-up.
2121

22-
2322
## Hardware Requirements
2423

2524
You will need a large amount of RAM (32GB or more) and a powerful CPU (e.g. Ryzen 9000 series) to run the models listed below.
2625
GPUs can run LLMs much faster, but a large amount of VRAM is required (24GB, if not more), which is not very common on consumer GPUs.
2726
Smaller models will run on more modest GPUs, but do not provide good results.
2827
MacBooks with a sufficient amount of unified memory can use GPU-acceleration, but do not outperform high-end desktop CPUs in our testing.
2928

30-
3129
## Selecting a Model
3230

3331
Ollama supports many different models.
3432
You can find a list of available models on the [Ollama website](https://ollama.com/library).
3533
Selecting a model that suits your use case, runs on your hardware configuration and achieves the desired speed requires some trial and error.
3634
The following rules and heuristics can be used to find a model:
3735

38-
* Must have at least a 32k context window (this is a requirement for Kilo Code).
39-
* Listed as supporting tools.
40-
* Number of parameters in the 7b to 24b range.
41-
* Prefer popular models.
42-
* Prefer newer models.
43-
36+
- Must have at least a 32k context window (this is a requirement for Kilo Code).
37+
- Listed as supporting tools.
38+
- Number of parameters in the 7b to 24b range.
39+
- Prefer popular models.
40+
- Prefer newer models.
4441

4542
### Recommendations for Kilo Code
4643

@@ -52,12 +49,12 @@ Create a simple web page with a button that greets the user when clicked.
5249

5350
A model is considered to pass if it produces a working result within a few tries. The models we found to work correctly are:
5451

55-
| Model name | Completion time |
56-
| --- | --- |
57-
| qwen2.5-coder:7b | 1x (baseline) |
58-
| devstral:24b | 2x |
59-
| gemma3:12b | 4x |
60-
| qwen3-8b | 12x |
52+
| Model name | Completion time |
53+
| ---------------- | --------------- |
54+
| qwen2.5-coder:7b | 1x (baseline) |
55+
| devstral:24b | 2x |
56+
| gemma3:12b | 4x |
57+
| qwen3-8b | 12x |
6158

6259
Our recommendation is to use **devstral:24b** if your hardware can handle it, because it makes fewer mistakes than qwen2.5-coder:7b.
6360
qwen2.5-coder:7b is worth considering because of its speed, if you can put up with its mistakes.
@@ -69,49 +66,56 @@ The result produced by devstral:24b is included below:
6966
```html
7067
<!DOCTYPE html>
7168
<html lang="en">
72-
<head>
73-
<meta charset="UTF-8">
74-
<meta name="viewport" content="width=device-width, initial-scale=1.0">
75-
<title>Greet User Button</title>
76-
<style>
77-
body {
78-
font-family: Arial, sans-serif;
79-
display: flex;
80-
justify-content: center;
81-
align-items: center;
82-
height: 100vh;
83-
margin: 0;
84-
}
85-
button {
86-
padding: 10px 20px;
87-
font-size: 16px;
88-
cursor: pointer;
89-
}
90-
</style>
91-
</head>
92-
<body>
93-
<button onclick="greetUser()">Greet Me!</button>
94-
95-
<script>
96-
function greetUser() {
97-
alert('Hello! Welcome to our website.');
98-
}
99-
</script>
100-
</body>
69+
<head>
70+
<meta charset="UTF-8" />
71+
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
72+
<title>Greet User Button</title>
73+
<style>
74+
body {
75+
font-family: Arial, sans-serif;
76+
display: flex;
77+
justify-content: center;
78+
align-items: center;
79+
height: 100vh;
80+
margin: 0;
81+
}
82+
button {
83+
padding: 10px 20px;
84+
font-size: 16px;
85+
cursor: pointer;
86+
}
87+
</style>
88+
</head>
89+
<body>
90+
<button onclick="greetUser()">Greet Me!</button>
91+
92+
<script>
93+
function greetUser() {
94+
alert("Hello! Welcome to our website.")
95+
}
96+
</script>
97+
</body>
10198
</html>
10299
```
103100

104101
The following models look like reasonable choices, but were found to **not** work properly with Kilo Code in its default configuration:
105102

106-
| Model name | Fail reason |
107-
| --- | --- |
108-
| deepseek-r1:7b | fails to use tools properly |
103+
| Model name | Fail reason |
104+
| -------------- | ------------------------------ |
105+
| deepseek-r1:7b | fails to use tools properly |
109106
| deepseek-r1:8b | gets stuck in a reasoning loop |
110107

108+
## Preventing prompt truncation
109+
110+
By default Ollama truncates prompts to a very short length.
111+
If you run into this problem, please see this FAQ item to resolve it:
112+
[How can I specify the context window size?](https://github.com/ollama/ollama/blob/4383a3ab7a075eff78b31f7dc84c747e2fcd22b8/docs/faq.md#how-can-i-specify-the-context-window-size)
113+
114+
If you decide to use the `OLLAMA_CONTEXT_LENGTH` environment variable, it needs to be visible to both the IDE and the Ollama server.
111115

112116
## Setting up Ollama
113117

114-
1. **Download and Install Ollama:** Download the Ollama installer for your operating system from the [Ollama website](https://ollama.com/). Follow the installation instructions and make sure Ollama is running:
118+
1. **Download and Install Ollama:** Download the Ollama installer for your operating system from the [Ollama website](https://ollama.com/). Follow the installation instructions and make sure Ollama is running:
115119

116120
```bash
117121
ollama serve
@@ -129,13 +133,12 @@ The following models look like reasonable choices, but were found to **not** wor
129133
ollama pull devstral:24b
130134
```
131135
132-
4. **Configure Kilo Code:**
133-
* Open the Kilo Code sidebar (<img src="/docs/img/kilo-v1.svg" width="12" /> icon).
134-
* Click the settings gear icon (<Codicon name="gear" />).
135-
* Select "ollama" as the API Provider.
136-
* Enter the Model name.
137-
* (Optional) You can configure the base URL if you're running Ollama on a different machine. The default is `http://localhost:11434`.
138-
136+
3. **Configure Kilo Code:**
137+
- Open the Kilo Code sidebar (<img src="/docs/img/kilo-v1.svg" width="12" /> icon).
138+
- Click the settings gear icon (<Codicon name="gear" />).
139+
- Select "ollama" as the API Provider.
140+
- Enter the Model name.
141+
- (Optional) You can configure the base URL if you're running Ollama on a different machine. The default is `http://localhost:11434`.
139142

140143
## Further Reading
141144

src/api/providers/fetchers/ollama.ts

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -44,11 +44,9 @@ export const parseOllamaModel = (rawModel: OllamaModelInfoResponse): ModelInfo =
4444
? parseInt(rawModel.parameters.match(/^num_ctx\s+(\d+)/m)?.[1] ?? "", 10) || undefined
4545
: undefined
4646

47-
const contextKey = Object.keys(rawModel.model_info).find((k) => k.includes("context_length"))
48-
const contextLengthFromModelInfo =
49-
contextKey && typeof rawModel.model_info[contextKey] === "number" ? rawModel.model_info[contextKey] : undefined
47+
const contextLengthFromEnvironment = parseInt(process.env.OLLAMA_CONTEXT_LENGTH || "4096", 10)
5048

51-
let contextWindow = contextLengthFromModelParameters ?? contextLengthFromModelInfo
49+
let contextWindow = contextLengthFromModelParameters ?? contextLengthFromEnvironment
5250

5351
if (contextWindow == 40960 && !contextLengthFromModelParameters) {
5452
contextWindow = 4096 // For some unknown reason, Ollama returns an undefind context as "40960" rather than 4096, which is what it actually enforces.

src/api/providers/native-ollama.ts

Lines changed: 19 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -143,21 +143,23 @@ export class NativeOllamaHandler extends BaseProvider implements SingleCompletio
143143
protected options: ApiHandlerOptions
144144
private client: Ollama | undefined
145145
protected models: Record<string, ModelInfo> = {}
146-
private isInitialized = false
146+
private isInitialized = false // kilocode_change
147147

148148
constructor(options: ApiHandlerOptions) {
149149
super()
150150
this.options = options
151-
this.initialize()
151+
this.initialize() // kilocode_change
152152
}
153153

154+
// kilocode_change start
154155
private async initialize(): Promise<void> {
155156
if (this.isInitialized) {
156157
return
157158
}
158159
await this.fetchModel()
159160
this.isInitialized = true
160161
}
162+
// kilocode_change end
161163

162164
private ensureClient(): Ollama {
163165
if (!this.client) {
@@ -187,25 +189,29 @@ export class NativeOllamaHandler extends BaseProvider implements SingleCompletio
187189
messages: Anthropic.Messages.MessageParam[],
188190
metadata?: ApiHandlerCreateMessageMetadata,
189191
): ApiStream {
192+
// kilocode_change start
190193
if (!this.isInitialized) {
191194
await this.initialize()
192195
}
196+
// kilocode_change end
193197

194198
const client = this.ensureClient()
195-
const { id: modelId, info: modelInfo } = this.getModel()
199+
const { id: modelId, info: modelInfo } = this.getModel() // kilocode_change: fetchModel => getModel
196200
const useR1Format = modelId.toLowerCase().includes("deepseek-r1")
197201

198202
const ollamaMessages: Message[] = [
199203
{ role: "system", content: systemPrompt },
200204
...convertToOllamaMessages(messages),
201205
]
202206

207+
// kilocode_change start
203208
const estimatedTokenCount = estimateOllamaTokenCount(ollamaMessages)
204209
if (modelInfo.maxTokens && estimatedTokenCount > modelInfo.maxTokens) {
205210
throw new Error(
206-
`Input message is too long for the selected model. Estimated tokens: ${estimatedTokenCount}, Max tokens: ${modelInfo.maxTokens}`,
211+
`Input message is too long for the selected model. Estimated tokens: ${estimatedTokenCount}, Max tokens: ${modelInfo.maxTokens}. To increase the context window size, see: http://localhost:3000/docs/providers/ollama#preventing-prompt-truncation`,
207212
)
208213
}
214+
// kilocode_change end
209215

210216
const matcher = new XmlMatcher(
211217
"think",
@@ -289,13 +295,14 @@ export class NativeOllamaHandler extends BaseProvider implements SingleCompletio
289295

290296
async fetchModel() {
291297
this.models = await getOllamaModels(this.options.ollamaBaseUrl)
292-
return this.models
298+
return this.models // kilocode_change
293299
}
294300

295301
override getModel(): { id: string; info: ModelInfo } {
296302
const modelId = this.options.ollamaModelId || ""
297-
const modelInfo = this.models[modelId]
298303

304+
// kilocode_change start
305+
const modelInfo = this.models[modelId]
299306
if (!modelInfo) {
300307
const availableModels = Object.keys(this.models)
301308
const errorMessage =
@@ -304,20 +311,24 @@ export class NativeOllamaHandler extends BaseProvider implements SingleCompletio
304311
: `Model ${modelId} not found. No models available.`
305312
throw new Error(errorMessage)
306313
}
314+
// kilocode_change end
307315

308316
return {
309317
id: modelId,
310-
info: modelInfo,
318+
info: modelInfo, // kilocode_change
311319
}
312320
}
313321

314322
async completePrompt(prompt: string): Promise<string> {
315323
try {
324+
// kilocode_change start
316325
if (!this.isInitialized) {
317326
await this.initialize()
318327
}
328+
// kilocode_change end
329+
319330
const client = this.ensureClient()
320-
const { id: modelId } = this.getModel()
331+
const { id: modelId } = this.getModel() // kilocode_change: fetchModel => getModel
321332
const useR1Format = modelId.toLowerCase().includes("deepseek-r1")
322333

323334
const response = await client.chat({

0 commit comments

Comments
 (0)