You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/providers/ollama.md
+61-4Lines changed: 61 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -53,7 +53,7 @@ Roo Code supports running models locally using Ollama. This provides privacy, of
53
53
ollama pull qwen2.5-coder:32b
54
54
```
55
55
56
-
3. **Configure the Model:**by default, Ollama uses a context window size of 2048 tokens, which is too small forRoo Code requests. You need to have at least 12k to get decent results, ideally - 32k. To configure a model, you actually need to set its parameters and save a copy of it.
56
+
3. **Configure the Model:**Configure your model’s context window in Ollama and save a copy. Roo automatically reads the model’s reported context window from Ollama and passes it as `num_ctx`; no Roo-side context size setting is required for the Ollama provider.
57
57
58
58
Load the model (we will use `qwen2.5-coder:32b` as an example):
59
59
@@ -77,9 +77,10 @@ Roo Code supports running models locally using Ollama. This provides privacy, of
77
77
* Open the Roo Code sidebar (<KangarooIcon /> icon).
78
78
* Click the settings gear icon (<Codicon name="gear" />).
79
79
* Select "ollama" as the API Provider.
80
-
* Enter the Model name from the previous step (e.g., `your_model_name`).
81
-
* (Optional) You can configure the base URL if you're running Ollama on a different machine. The default is `http://localhost:11434`.
82
-
* (Optional) Configure Model context size in Advanced settings, so Roo Code knows how to manage its sliding window.
80
+
* Enter the model tag or saved name from the previous step (e.g., `your_model_name`).
81
+
* (Optional) Configure the base URL if you're running Ollama on a different machine. The default is `http://localhost:11434`.
82
+
* (Optional) Enter an API Key if your Ollama server requires authentication.
83
+
* (Advanced) Roo uses Ollama's native API by default for the "ollama" provider. An OpenAI-compatible `/v1` handler also exists but isn't required for typical setups.
83
84
84
85
---
85
86
@@ -90,3 +91,59 @@ Roo Code supports running models locally using Ollama. This provides privacy, of
90
91
* **Offline Use:** Once you've downloaded a model, you can use Roo Code offline with that model.
91
92
***Token Tracking:** Roo Code tracks token usage for models run via Ollama, helping you monitor consumption.
92
93
***Ollama Documentation:** Refer to the [Ollama documentation](https://ollama.com/docs) for more information on installing, configuring, and using Ollama.
94
+
95
+
---
96
+
97
+
## Troubleshooting
98
+
99
+
### Out of Memory (OOM) on First Request
100
+
101
+
**Symptoms**
102
+
- First request from Roo fails with an out-of-memory error
103
+
- GPU/CPU memory usage spikes when the model first loads
104
+
- Works after you manually start the model in Ollama
105
+
106
+
**Cause**
107
+
If no model instance is running, Ollama spins one up on demand. During that cold start it may allocate a larger context window than expected. The larger context window increases memory usage and can exceed available VRAM or RAM. This is an Ollama startup behavior, not a Roo Code bug.
3. **Ensure the model's context window is pinned**
133
+
Save your Ollama model with an appropriate `num_ctx` (e.g., via `/set` + `/save`, or a Modelfile). Roo reads this automatically and passes it as `num_ctx`; there is no Roo-side context size setting for the Ollama provider.
134
+
135
+
4. **Use smaller variants**
136
+
If GPU memory is limited, use a smaller quant (e.g., q4 instead of q5) or a smaller parameter size (e.g., 7B/13B instead of 32B).
137
+
138
+
5. **Restart after an OOM**
139
+
```bash
140
+
ollama ps
141
+
ollama stop <model-name>
142
+
```
143
+
144
+
**Quick checklist**
145
+
- Model is running before Roo request
146
+
- `num_ctx` pinned (Modelfile or `/set` + `/save`)
147
+
- Model saved with appropriate `num_ctx` (Roo uses this automatically)
0 commit comments