Skip to content

Commit 05b986d

Browse files
docs: update Ollama setup instructions and add troubleshooting section (#337)
1 parent de5a047 commit 05b986d

File tree

1 file changed

+61
-4
lines changed

1 file changed

+61
-4
lines changed

docs/providers/ollama.md

Lines changed: 61 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ Roo Code supports running models locally using Ollama. This provides privacy, of
5353
ollama pull qwen2.5-coder:32b
5454
```
5555

56-
3. **Configure the Model:** by default, Ollama uses a context window size of 2048 tokens, which is too small for Roo Code requests. You need to have at least 12k to get decent results, ideally - 32k. To configure a model, you actually need to set its parameters and save a copy of it.
56+
3. **Configure the Model:** Configure your model’s context window in Ollama and save a copy. Roo automatically reads the model’s reported context window from Ollama and passes it as `num_ctx`; no Roo-side context size setting is required for the Ollama provider.
5757

5858
Load the model (we will use `qwen2.5-coder:32b` as an example):
5959

@@ -77,9 +77,10 @@ Roo Code supports running models locally using Ollama. This provides privacy, of
7777
* Open the Roo Code sidebar (<KangarooIcon /> icon).
7878
* Click the settings gear icon (<Codicon name="gear" />).
7979
* Select "ollama" as the API Provider.
80-
* Enter the Model name from the previous step (e.g., `your_model_name`).
81-
* (Optional) You can configure the base URL if you're running Ollama on a different machine. The default is `http://localhost:11434`.
82-
* (Optional) Configure Model context size in Advanced settings, so Roo Code knows how to manage its sliding window.
80+
* Enter the model tag or saved name from the previous step (e.g., `your_model_name`).
81+
* (Optional) Configure the base URL if you're running Ollama on a different machine. The default is `http://localhost:11434`.
82+
* (Optional) Enter an API Key if your Ollama server requires authentication.
83+
* (Advanced) Roo uses Ollama's native API by default for the "ollama" provider. An OpenAI-compatible `/v1` handler also exists but isn't required for typical setups.
8384
8485
---
8586
@@ -90,3 +91,59 @@ Roo Code supports running models locally using Ollama. This provides privacy, of
9091
* **Offline Use:** Once you've downloaded a model, you can use Roo Code offline with that model.
9192
* **Token Tracking:** Roo Code tracks token usage for models run via Ollama, helping you monitor consumption.
9293
* **Ollama Documentation:** Refer to the [Ollama documentation](https://ollama.com/docs) for more information on installing, configuring, and using Ollama.
94+
95+
---
96+
97+
## Troubleshooting
98+
99+
### Out of Memory (OOM) on First Request
100+
101+
**Symptoms**
102+
- First request from Roo fails with an out-of-memory error
103+
- GPU/CPU memory usage spikes when the model first loads
104+
- Works after you manually start the model in Ollama
105+
106+
**Cause**
107+
If no model instance is running, Ollama spins one up on demand. During that cold start it may allocate a larger context window than expected. The larger context window increases memory usage and can exceed available VRAM or RAM. This is an Ollama startup behavior, not a Roo Code bug.
108+
109+
**Fixes**
110+
1. **Preload the model**
111+
```bash
112+
ollama run &lt;model-name&gt;
113+
```
114+
Keep it running, then issue the request from Roo.
115+
116+
2. **Pin the context window (`num_ctx`)**
117+
- Option A — interactive session, then save:
118+
```bash
119+
# inside `ollama run &lt;base-model&gt;`
120+
/set parameter num_ctx 32768
121+
/save &lt;your_model_name&gt;
122+
```
123+
- Option B — Modelfile:
124+
```text
125+
PARAMETER num_ctx 32768
126+
```
127+
Then re-create the model:
128+
```bash
129+
ollama create &lt;your_model_name&gt; -f Modelfile
130+
```
131+
132+
3. **Ensure the model's context window is pinned**
133+
Save your Ollama model with an appropriate `num_ctx` (e.g., via `/set` + `/save`, or a Modelfile). Roo reads this automatically and passes it as `num_ctx`; there is no Roo-side context size setting for the Ollama provider.
134+
135+
4. **Use smaller variants**
136+
If GPU memory is limited, use a smaller quant (e.g., q4 instead of q5) or a smaller parameter size (e.g., 7B/13B instead of 32B).
137+
138+
5. **Restart after an OOM**
139+
```bash
140+
ollama ps
141+
ollama stop &lt;model-name&gt;
142+
```
143+
144+
**Quick checklist**
145+
- Model is running before Roo request
146+
- `num_ctx` pinned (Modelfile or `/set` + `/save`)
147+
- Model saved with appropriate `num_ctx` (Roo uses this automatically)
148+
- Model fits available VRAM/RAM
149+
- No leftover Ollama processes

0 commit comments

Comments
 (0)