Skip to content

Commit b8cca14

Browse files
committed
more known issues and Dorin edits
1 parent ed4b936 commit b8cca14

File tree

1 file changed

+50
-42
lines changed

1 file changed

+50
-42
lines changed

content/manuals/desktop/features/model-runner.md

Lines changed: 50 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -7,14 +7,14 @@ params:
77
text: Beta
88
weight: 20
99
description: Learn how to use Docker Model Runner to manage and run AI models.
10-
keywords: Docker, ai, model runner, docker deskotp,
10+
keywords: Docker, ai, model runner, docker deskotp, llm
1111
---
1212

1313
{{< summary-bar feature_name="Docker Model Runner" >}}
1414

1515
The Docker Model Runner plugin lets you:
1616

17-
- Pull models from Docker Hub
17+
- [Pull models from Docker Hub](https://hub.docker.com/u/ai)
1818
- Run AI models directly from the command line
1919
- Manage local models (add, list, remove)
2020
- Interact with models using a submitted prompt or in chat mode
@@ -73,14 +73,14 @@ $ docker model pull <model>
7373
Example:
7474

7575
```console
76-
$ docker model pull ai/llama3.2:1b
76+
$ docker model pull ai/smollm2
7777
```
7878

7979
Output:
8080

8181
```text
82-
Downloaded: 626.05 MB
83-
Model ai/llama3.2:1b pulled successfully
82+
Downloaded: 257.71 MB
83+
Model ai/smo11m2 pulled successfully
8484
```
8585

8686
### List available models
@@ -91,11 +91,11 @@ Lists all models currently pulled to your local environment.
9191
$ docker model list
9292
```
9393

94-
You will something similar to:
94+
You will see something similar to:
9595

9696
```text
97-
MODEL PARAMETERS QUANTIZATION ARCHITECTURE MODEL ID CREATED SIZE
98-
ignaciolopezluna020/gemma-3-it:4B-Q4_K_M 3.88 B IQ2_XXS/Q4_K_M gemma3 adea14bef2fe 55 years ago 2.31 GiB
97+
+MODEL PARAMETERS QUANTIZATION ARCHITECTURE MODEL ID CREATED SIZE
98+
+ai/smollm2 361.82 M IQ2_XXS/Q4_K_M llama 354bf30d0aa3 3 days ago 256.35 MiB
9999
```
100100

101101
### Run a model
@@ -105,28 +105,29 @@ Run a model and interact with it using a submitted prompt or in chat mode.
105105
#### One-time prompt
106106

107107
```console
108-
$ docker model run ai/llama3.2:1b "Hi"
108+
$ docker model run ai/smo11m2 "Hi"
109109
```
110110

111111
Output:
112112

113113
```text
114-
Hi! How can I assist you today
114+
Hello! How can I assist you today?
115115
```
116116

117117
#### Interactive chat
118118

119119
```console
120-
docker model run ai/llama3.2:1b
120+
docker model run ai/smo11m2
121121
```
122122

123123
Output:
124124

125125
```text
126126
Interactive chat mode started. Type '/bye' to exit.
127127
> Hi
128-
Hi! How are you doing today?
128+
Hi there! It's SmolLM, AI assistant. How can I help you today?
129129
> /bye
130+
Chat session ended.
130131
```
131132

132133
### Remove a model
@@ -149,24 +150,17 @@ You can now start building your Generative AI application powered by the Docker
149150

150151
If you want to try an existing GenAI application, follow these instructions.
151152

152-
1. Pull the required model from Docker Hub so it's ready for use in your app.
153+
1. Set up the sample app. Clone and run the following repository:
153154

154155
```console
155-
$ docker model pull ai/llama3.2:1b
156+
$ git clone https://github.com/docker/hello-genai.git
156157
```
157158

158-
2. Set up the sample app. Download and unzip the following folder:
159-
160-
[myapp.zip](attachment:abc104c4-e0c9-4163-b90b-e1f06caab687:myapp.zip)
159+
2. In your terminal, navigate to the `hello-genai` directory.
161160

162-
3. In your terminal, navigate to the `myapp` folder.
163-
4. Start the app with Docker Compose:
161+
3. Run `run.sh` for pulling the chosen model and run the app(s):
164162

165-
```console
166-
$ docker compose up
167-
```
168-
169-
5. Open you app in the browser at `http://localhost:3000`.
163+
4. Open you app in the browser at the addresses specified in the repository [README](https://github.com/docker/hello-genai).
170164

171165
You'll see the GenAI app's interface where you can start typing your prompts.
172166

@@ -193,20 +187,20 @@ http://model-runner.docker.internal/
193187
GET /models/{namespace}/{name}
194188
DELETE /models/{namespace}/{name}
195189
196-
# OpenAI endpoints (per-backend)
197-
GET /engines/{backend}/v1/models
198-
GET /engines/{backend}/v1/models/{namespace}/{name}
199-
POST /engines/{backend}/v1/chat/completions
200-
POST /engines/{backend}/v1/completions
201-
POST /engines/{backend}/v1/embeddings
202-
Note: You can also omit {backend} and it will default to llama.cpp
190+
# OpenAI endpoints
191+
GET /engines/llama.cpp/v1/models
192+
GET /engines/llama.cpp/v1/models/{namespace}/{name}
193+
POST /engines/llama.cpp/v1/chat/completions
194+
POST /engines/llama.cpp/v1/completions
195+
POST /engines/llama.cpp/v1/embeddings
196+
Note: You can also omit llama.cpp.
203197
E.g., POST /engines/v1/chat/completions.
204198
205199
#### Inside or outside containers (host) ####
206200
207201
Same endpoints on /var/run/docker.sock
208202
209-
# Until stable...
203+
# While still in Beta
210204
Prefixed with /exp/vDD4.40
211205
```
212206

@@ -222,7 +216,7 @@ Examples of calling an OpenAI endpoint (`chat/completions`) from within another
222216
curl http://model-runner.docker.internal/engines/llama.cpp/v1/chat/completions \
223217
-H "Content-Type: application/json" \
224218
-d '{
225-
"model": "ai/llama3.2:1b",
219+
"model": "ai/smo11m2",
226220
"messages": [
227221
{
228222
"role": "system",
@@ -248,7 +242,7 @@ curl --unix-socket $HOME/.docker/run/docker.sock \
248242
localhost/exp/vDD4.40/engines/llama.cpp/v1/chat/completions \
249243
-H "Content-Type: application/json" \
250244
-d '{
251-
"model": "ai/llama3.2:1b",
245+
"model": "ai/smo11m2",
252246
"messages": [
253247
{
254248
"role": "system",
@@ -265,21 +259,17 @@ curl --unix-socket $HOME/.docker/run/docker.sock \
265259

266260
#### From the host using TCP
267261

268-
In case you want to interact with the API from the host, but use TCP instead of a Docker socket, it is recommended you use a helper container as a reverse-proxy. For example, in order to forward the API to `8080`:
269-
270-
```bash
271-
docker run -d --name model-runner-proxy -p 8080:80 alpine/socat tcp-listen:80,fork,reuseaddr tcp:model-runner.docker.internal:80
272-
```
262+
In case you want to interact with the API from the host, but use TCP instead of a Docker socket, you can enable the host-side TCP support from the Docker Desktop GUI, or via the [Docker Desktop CLI](/manuals/desktop/features/desktop-cli.md). For example, using `docker desktop enable model-runner --tcp <port>`.
273263

274-
Afterwards, interact with it as previously documented using `localhost` and the forward port, in this case `8080`:
264+
Afterwards, interact with it as previously documented using `localhost` and the chosen, or the default port.
275265

276266
```bash
277267
#!/bin/sh
278268

279-
curl http://localhost:8080/engines/llama.cpp/v1/chat/completions \
269+
curl http://localhost:12434/engines/llama.cpp/v1/chat/completions \
280270
-H "Content-Type: application/json" \
281271
-d '{
282-
"model": "ai/llama3.2:1b",
272+
"model": "ai/smo11m2",
283273
"messages": [
284274
{
285275
"role": "system",
@@ -313,6 +303,24 @@ $ ln -s /Applications/Docker.app/Contents/Resources/cli-plugins/docker-model ~/.
313303

314304
Once linked, re-run the command.
315305

306+
### No safeguard for running oversized models
307+
308+
Currently, Docker Model Runner doesn't include safeguards to prevent you from launching models that exceed their system’s available resources. Attempting to run a model that is too large for the host machine may result in severe slowdowns or render the system temporarily unusable. This issue is particularly common when running LLMs models without sufficient GPU memory or system RAM.
309+
310+
### `model run` drops into chat even if pull fails
311+
312+
If a model image fails to pull successfully, for example due to network issues or lack of disk space, the `docker model run` command will still drop you into the chat interface, even though the model isn’t actually available. This can lead to confusion, as the chat will not function correctly without a running model.
313+
314+
You can manually retry the `docker model pull` command to ensure the image is available before running it again.
315+
316+
### No consistent digest support in Model CLI
317+
318+
The Docker Model CLI currently lacks consistent support for specifying models by image digest. As a temporary workaround, you should refer to models by name instead of digest.
319+
320+
### Misleading pull progress after failed initial attempt
321+
322+
In some cases, if an initial `docker model pull` fails partway through, a subsequent successful pull may misleadingly report “0 bytes” downloaded even though data is being fetched in the background. This can give the impression that nothing is happening, when in fact the model is being retrieved. Despite the incorrect progress output, the pull typically completes as expected.
323+
316324
## Share feedback
317325

318326
Thanks for trying out Docker Model Runner. Give feedback or report any bugs you may find through the **Give feedback** link next to the **Enable Docker Model Runner** setting.

0 commit comments

Comments
 (0)