Use 4096 when no num_ctx is set in Ollama model parameters, because this is what is uses in practice.

chrarnoldus · chrarnoldus · commit f18841a12c8f · 2025-08-29T08:58:52.000+02:00
diff --git a/apps/kilocode-docs/docs/providers/ollama.md b/apps/kilocode-docs/docs/providers/ollama.md
@@ -19,28 +19,25 @@ More trial and error will be required to find the right prompt.
 Local LLMs are usually also not very fast.
 Using simple prompts, keeping conversations short and disabling MCP tools can result in a speed-up.
 
-
 ## Hardware Requirements
 
 You will need a large amount of RAM (32GB or more) and a powerful CPU (e.g. Ryzen 9000 series) to run the models listed below.
 GPUs can run LLMs much faster, but a large amount of VRAM is required (24GB, if not more), which is not very common on consumer GPUs.
 Smaller models will run on more modest GPUs, but do not provide good results.
 MacBooks with a sufficient amount of unified memory can use GPU-acceleration, but do not outperform high-end desktop CPUs in our testing.
 
-
 ## Selecting a Model
 
 Ollama supports many different models.
 You can find a list of available models on the [Ollama website](https://ollama.com/library).
 Selecting a model that suits your use case, runs on your hardware configuration and achieves the desired speed requires some trial and error.
 The following rules and heuristics can be used to find a model:
 
-* Must have at least a 32k context window (this is a requirement for Kilo Code).
-* Listed as supporting tools.
-* Number of parameters in the 7b to 24b range.
-* Prefer popular models.
-* Prefer newer models.
-
+- Must have at least a 32k context window (this is a requirement for Kilo Code).
+- Listed as supporting tools.
+- Number of parameters in the 7b to 24b range.
+- Prefer popular models.
+- Prefer newer models.
 
 ### Recommendations for Kilo Code
 
@@ -52,12 +49,12 @@ Create a simple web page with a button that greets the user when clicked.
 
 A model is considered to pass if it produces a working result within a few tries. The models we found to work correctly are:
 
-| Model name | Completion time |
-| --- | --- |
-| qwen2.5-coder:7b | 1x (baseline) |
-| devstral:24b | 2x |
-| gemma3:12b | 4x |
-| qwen3-8b | 12x |
+| Model name       | Completion time |
+| ---------------- | --------------- |
+| qwen2.5-coder:7b | 1x (baseline)   |
+| devstral:24b     | 2x              |
+| gemma3:12b       | 4x              |
+| qwen3-8b         | 12x             |
 
 Our recommendation is to use **devstral:24b** if your hardware can handle it, because it makes fewer mistakes than qwen2.5-coder:7b.
 qwen2.5-coder:7b is worth considering because of its speed, if you can put up with its mistakes.
@@ -69,49 +66,56 @@ The result produced by devstral:24b is included below:
 ```html
 <!DOCTYPE html>
 <html lang="en">
-<head>
-    <meta charset="UTF-8">
-    <meta name="viewport" content="width=device-width, initial-scale=1.0">
-    <title>Greet User Button</title>
-    <style>
-        body {
-            font-family: Arial, sans-serif;
-            display: flex;
-            justify-content: center;
-            align-items: center;
-            height: 100vh;
-            margin: 0;
-        }
-        button {
-            padding: 10px 20px;
-            font-size: 16px;
-            cursor: pointer;
-        }
-    </style>
-</head>
-<body>
-    <button onclick="greetUser()">Greet Me!</button>
-
-    <script>
-        function greetUser() {
-            alert('Hello! Welcome to our website.');
-        }
-    </script>
-</body>
+	<head>
+		<meta charset="UTF-8" />
+		<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+		<title>Greet User Button</title>
+		<style>
+			body {
+				font-family: Arial, sans-serif;
+				display: flex;
+				justify-content: center;
+				align-items: center;
+				height: 100vh;
+				margin: 0;
+			}
+			button {
+				padding: 10px 20px;
+				font-size: 16px;
+				cursor: pointer;
+			}
+		</style>
+	</head>
+	<body>
+		<button onclick="greetUser()">Greet Me!</button>
+
+		<script>
+			function greetUser() {
+				alert("Hello! Welcome to our website.")
+			}
+		</script>
+	</body>
 </html>
 ```
 
 The following models look like reasonable choices, but were found to **not** work properly with Kilo Code in its default configuration:
 
-| Model name | Fail reason |
-| --- | --- |
-| deepseek-r1:7b | fails to use tools properly |
+| Model name     | Fail reason                    |
+| -------------- | ------------------------------ |
+| deepseek-r1:7b | fails to use tools properly    |
 | deepseek-r1:8b | gets stuck in a reasoning loop |
 
+## Preventing prompt truncation
+
+By default Ollama truncates prompts to a very short length.
+If you run into this problem, please see this FAQ item to resolve it:
+[How can I specify the context window size?](https://github.com/ollama/ollama/blob/4383a3ab7a075eff78b31f7dc84c747e2fcd22b8/docs/faq.md#how-can-i-specify-the-context-window-size)
+
+If you decide to use the `OLLAMA_CONTEXT_LENGTH` environment variable, it needs to be visible to both the IDE and the Ollama server.
 
 ## Setting up Ollama
 
-1.  **Download and Install Ollama:**  Download the Ollama installer for your operating system from the [Ollama website](https://ollama.com/). Follow the installation instructions and make sure Ollama is running:
+1.  **Download and Install Ollama:** Download the Ollama installer for your operating system from the [Ollama website](https://ollama.com/). Follow the installation instructions and make sure Ollama is running:
 
     ```bash
     ollama serve
@@ -129,13 +133,12 @@ The following models look like reasonable choices, but were found to **not** wor
     ollama pull devstral:24b
     ```
 
-4.  **Configure Kilo Code:**
-    *   Open the Kilo Code sidebar (<img src="/docs/img/kilo-v1.svg" width="12" /> icon).
-    *   Click the settings gear icon (<Codicon name="gear" />).
-    *   Select "ollama" as the API Provider.
-    *   Enter the Model name.
-    *   (Optional) You can configure the base URL if you're running Ollama on a different machine. The default is `http://localhost:11434`.
-
+3.  **Configure Kilo Code:**
+    - Open the Kilo Code sidebar (<img src="/docs/img/kilo-v1.svg" width="12" /> icon).
+    - Click the settings gear icon (<Codicon name="gear" />).
+    - Select "ollama" as the API Provider.
+    - Enter the Model name.
+    - (Optional) You can configure the base URL if you're running Ollama on a different machine. The default is `http://localhost:11434`.
 
 ## Further Reading
 
diff --git a/src/api/providers/fetchers/ollama.ts b/src/api/providers/fetchers/ollama.ts
@@ -44,11 +44,9 @@ export const parseOllamaModel = (rawModel: OllamaModelInfoResponse): ModelInfo =
 			? parseInt(rawModel.parameters.match(/^num_ctx\s+(\d+)/m)?.[1] ?? "", 10) || undefined
 			: undefined
 
-	const contextKey = Object.keys(rawModel.model_info).find((k) => k.includes("context_length"))
-	const contextLengthFromModelInfo =
-		contextKey && typeof rawModel.model_info[contextKey] === "number" ? rawModel.model_info[contextKey] : undefined
+	const contextLengthFromEnvironment = parseInt(process.env.OLLAMA_CONTEXT_LENGTH || "4096", 10)
 
-	let contextWindow = contextLengthFromModelParameters ?? contextLengthFromModelInfo
+	let contextWindow = contextLengthFromModelParameters ?? contextLengthFromEnvironment
 
 	if (contextWindow == 40960 && !contextLengthFromModelParameters) {
 		contextWindow = 4096 // For some unknown reason, Ollama returns an undefind context as "40960" rather than 4096, which is what it actually enforces.
diff --git a/src/api/providers/native-ollama.ts b/src/api/providers/native-ollama.ts
@@ -143,21 +143,23 @@ export class NativeOllamaHandler extends BaseProvider implements SingleCompletio
 	protected options: ApiHandlerOptions
 	private client: Ollama | undefined
 	protected models: Record<string, ModelInfo> = {}
-	private isInitialized = false
+	private isInitialized = false // kilocode_change
 
 	constructor(options: ApiHandlerOptions) {
 		super()
 		this.options = options
-		this.initialize()
+		this.initialize() // kilocode_change
 	}
 
+	// kilocode_change start
 	private async initialize(): Promise<void> {
 		if (this.isInitialized) {
 			return
 		}
 		await this.fetchModel()
 		this.isInitialized = true
 	}
+	// kilocode_change end
 
 	private ensureClient(): Ollama {
 		if (!this.client) {
@@ -187,25 +189,29 @@ export class NativeOllamaHandler extends BaseProvider implements SingleCompletio
 		messages: Anthropic.Messages.MessageParam[],
 		metadata?: ApiHandlerCreateMessageMetadata,
 	): ApiStream {
+		// kilocode_change start
 		if (!this.isInitialized) {
 			await this.initialize()
 		}
+		// kilocode_change end
 
 		const client = this.ensureClient()
-		const { id: modelId, info: modelInfo } = this.getModel()
+		const { id: modelId, info: modelInfo } = this.getModel() // kilocode_change: fetchModel => getModel
 		const useR1Format = modelId.toLowerCase().includes("deepseek-r1")
 
 		const ollamaMessages: Message[] = [
 			{ role: "system", content: systemPrompt },
 			...convertToOllamaMessages(messages),
 		]
 
+		// kilocode_change start
 		const estimatedTokenCount = estimateOllamaTokenCount(ollamaMessages)
 		if (modelInfo.maxTokens && estimatedTokenCount > modelInfo.maxTokens) {
 			throw new Error(
-				`Input message is too long for the selected model. Estimated tokens: ${estimatedTokenCount}, Max tokens: ${modelInfo.maxTokens}`,
+				`Input message is too long for the selected model. Estimated tokens: ${estimatedTokenCount}, Max tokens: ${modelInfo.maxTokens}. To increase the context window size, see: http://localhost:3000/docs/providers/ollama#preventing-prompt-truncation`,
 			)
 		}
+		// kilocode_change end
 
 		const matcher = new XmlMatcher(
 			"think",
@@ -289,13 +295,14 @@ export class NativeOllamaHandler extends BaseProvider implements SingleCompletio
 
 	async fetchModel() {
 		this.models = await getOllamaModels(this.options.ollamaBaseUrl)
-		return this.models
+		return this.models // kilocode_change
 	}
 
 	override getModel(): { id: string; info: ModelInfo } {
 		const modelId = this.options.ollamaModelId || ""
-		const modelInfo = this.models[modelId]
 
+		// kilocode_change start
+		const modelInfo = this.models[modelId]
 		if (!modelInfo) {
 			const availableModels = Object.keys(this.models)
 			const errorMessage =
@@ -304,20 +311,24 @@ export class NativeOllamaHandler extends BaseProvider implements SingleCompletio
 					: `Model ${modelId} not found. No models available.`
 			throw new Error(errorMessage)
 		}
+		// kilocode_change end
 
 		return {
 			id: modelId,
-			info: modelInfo,
+			info: modelInfo, // kilocode_change
 		}
 	}
 
 	async completePrompt(prompt: string): Promise<string> {
 		try {
+			// kilocode_change start
 			if (!this.isInitialized) {
 				await this.initialize()
 			}
+			// kilocode_change end
+
 			const client = this.ensureClient()
-			const { id: modelId } = this.getModel()
+			const { id: modelId } = this.getModel() // kilocode_change: fetchModel => getModel
 			const useR1Format = modelId.toLowerCase().includes("deepseek-r1")
 
 			const response = await client.chat({