Skip to content

Commit c563d87

Browse files
authored
Engdocs 2499 (#22296)
<!--Delete sections as needed --> ## Description <!-- Tell us what you did and why --> ## Related issues or tickets <!-- Related issues, pull requests, or Jira tickets --> ## Reviews <!-- Notes for reviewers here --> <!-- List applicable reviews (optionally @tag reviewers) --> - [ ] Technical review - [ ] Editorial review - [ ] Product review
1 parent ec13567 commit c563d87

File tree

2 files changed

+330
-0
lines changed

2 files changed

+330
-0
lines changed
Lines changed: 326 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,326 @@
1+
---
2+
title: Docker Model Runner
3+
params:
4+
sidebar:
5+
badge:
6+
color: blue
7+
text: Beta
8+
weight: 20
9+
description: Learn how to use Docker Model Runner to manage and run AI models.
10+
keywords: Docker, ai, model runner, docker deskotp, llm
11+
---
12+
13+
{{< summary-bar feature_name="Docker Model Runner" >}}
14+
15+
The Docker Model Runner plugin lets you:
16+
17+
- [Pull models from Docker Hub](https://hub.docker.com/u/ai)
18+
- Run AI models directly from the command line
19+
- Manage local models (add, list, remove)
20+
- Interact with models using a submitted prompt or in chat mode
21+
22+
Models are pulled from Docker Hub the first time they're used and stored locally. They're loaded into memory only at runtime when a request is made, and unloaded when not in use to optimize resources. Since models can be large, the initial pull may take some time — but after that, they're cached locally for faster access. You can interact with the model using [OpenAI-compatible APIs](#what-api-endpoints-are-available).
23+
24+
## Enable the feature
25+
26+
To enable Docker Model Runner:
27+
28+
1. Open the **Settings** view in Docker Desktop.
29+
2. Navigate to the **Beta** tab in **Features in development**.
30+
3. Check the **Enable Docker Model Runner** checkbox.
31+
4. Select **Apply & restart**.
32+
33+
## Available commands
34+
35+
### Model runner status
36+
37+
Check whether the Docker Model Runner is active:
38+
39+
```console
40+
$ docker model status
41+
```
42+
43+
### View all commands
44+
45+
Displays help information and a list of available subcommands.
46+
47+
```console
48+
$ docker model help
49+
```
50+
51+
Output:
52+
53+
```text
54+
Usage: docker model COMMAND
55+
56+
Commands:
57+
list List models available locally
58+
pull Download a model from Docker Hub
59+
rm Remove a downloaded model
60+
run Run a model interactively or with a prompt
61+
status Check if the model runner is running
62+
version Show the current version
63+
```
64+
65+
### Pull a model
66+
67+
Pulls a model from Docker Hub to your local environment.
68+
69+
```console
70+
$ docker model pull <model>
71+
```
72+
73+
Example:
74+
75+
```console
76+
$ docker model pull ai/smollm2
77+
```
78+
79+
Output:
80+
81+
```text
82+
Downloaded: 257.71 MB
83+
Model ai/smo11m2 pulled successfully
84+
```
85+
86+
### List available models
87+
88+
Lists all models currently pulled to your local environment.
89+
90+
```console
91+
$ docker model list
92+
```
93+
94+
You will see something similar to:
95+
96+
```text
97+
+MODEL PARAMETERS QUANTIZATION ARCHITECTURE MODEL ID CREATED SIZE
98+
+ai/smollm2 361.82 M IQ2_XXS/Q4_K_M llama 354bf30d0aa3 3 days ago 256.35 MiB
99+
```
100+
101+
### Run a model
102+
103+
Run a model and interact with it using a submitted prompt or in chat mode.
104+
105+
#### One-time prompt
106+
107+
```console
108+
$ docker model run ai/smo11m2 "Hi"
109+
```
110+
111+
Output:
112+
113+
```text
114+
Hello! How can I assist you today?
115+
```
116+
117+
#### Interactive chat
118+
119+
```console
120+
docker model run ai/smo11m2
121+
```
122+
123+
Output:
124+
125+
```text
126+
Interactive chat mode started. Type '/bye' to exit.
127+
> Hi
128+
Hi there! It's SmolLM, AI assistant. How can I help you today?
129+
> /bye
130+
Chat session ended.
131+
```
132+
133+
### Remove a model
134+
135+
Removes a downloaded model from your system.
136+
137+
```console
138+
$ docker model rm <model>
139+
```
140+
141+
Output:
142+
143+
```text
144+
Model <model> removed successfully
145+
```
146+
147+
## Integrate the Docker Model Runner into your software development lifecycle
148+
149+
You can now start building your Generative AI application powered by the Docker Model Runner.
150+
151+
If you want to try an existing GenAI application, follow these instructions.
152+
153+
1. Set up the sample app. Clone and run the following repository:
154+
155+
```console
156+
$ git clone https://github.com/docker/hello-genai.git
157+
```
158+
159+
2. In your terminal, navigate to the `hello-genai` directory.
160+
161+
3. Run `run.sh` for pulling the chosen model and run the app(s):
162+
163+
4. Open you app in the browser at the addresses specified in the repository [README](https://github.com/docker/hello-genai).
164+
165+
You'll see the GenAI app's interface where you can start typing your prompts.
166+
167+
You can now interact with your own GenAI app, powered by a local model. Try a few prompts and notice how fast the responses are — all running on your machine with Docker.
168+
169+
## FAQs
170+
171+
### What models are available?
172+
173+
All the available models are hosted in the [public Docker Hub namespace of `ai`](https://hub.docker.com/u/ai).
174+
175+
### What API endpoints are available?
176+
177+
Once the feature is enabled, the following new APIs are available:
178+
179+
```text
180+
#### Inside containers ####
181+
182+
http://model-runner.docker.internal/
183+
184+
# Docker Model management
185+
POST /models/create
186+
GET /models
187+
GET /models/{namespace}/{name}
188+
DELETE /models/{namespace}/{name}
189+
190+
# OpenAI endpoints
191+
GET /engines/llama.cpp/v1/models
192+
GET /engines/llama.cpp/v1/models/{namespace}/{name}
193+
POST /engines/llama.cpp/v1/chat/completions
194+
POST /engines/llama.cpp/v1/completions
195+
POST /engines/llama.cpp/v1/embeddings
196+
Note: You can also omit llama.cpp.
197+
E.g., POST /engines/v1/chat/completions.
198+
199+
#### Inside or outside containers (host) ####
200+
201+
Same endpoints on /var/run/docker.sock
202+
203+
# While still in Beta
204+
Prefixed with /exp/vDD4.40
205+
```
206+
207+
### How do I interact through the OpenAI API?
208+
209+
#### From within a container
210+
211+
Examples of calling an OpenAI endpoint (`chat/completions`) from within another container using `curl`:
212+
213+
```bash
214+
#!/bin/sh
215+
216+
curl http://model-runner.docker.internal/engines/llama.cpp/v1/chat/completions \
217+
-H "Content-Type: application/json" \
218+
-d '{
219+
"model": "ai/smo11m2",
220+
"messages": [
221+
{
222+
"role": "system",
223+
"content": "You are a helpful assistant."
224+
},
225+
{
226+
"role": "user",
227+
"content": "Please write 500 words about the fall of Rome."
228+
}
229+
]
230+
}'
231+
232+
```
233+
234+
#### From the host using a Unix socket
235+
236+
Examples of calling an OpenAI endpoint (`chat/completions`) through the Docker socket from the host using `curl`:
237+
238+
```bash
239+
#!/bin/sh
240+
241+
curl --unix-socket $HOME/.docker/run/docker.sock \
242+
localhost/exp/vDD4.40/engines/llama.cpp/v1/chat/completions \
243+
-H "Content-Type: application/json" \
244+
-d '{
245+
"model": "ai/smo11m2",
246+
"messages": [
247+
{
248+
"role": "system",
249+
"content": "You are a helpful assistant."
250+
},
251+
{
252+
"role": "user",
253+
"content": "Please write 500 words about the fall of Rome."
254+
}
255+
]
256+
}'
257+
258+
```
259+
260+
#### From the host using TCP
261+
262+
In case you want to interact with the API from the host, but use TCP instead of a Docker socket, you can enable the host-side TCP support from the Docker Desktop GUI, or via the [Docker Desktop CLI](/manuals/desktop/features/desktop-cli.md). For example, using `docker desktop enable model-runner --tcp <port>`.
263+
264+
Afterwards, interact with it as previously documented using `localhost` and the chosen, or the default port.
265+
266+
```bash
267+
#!/bin/sh
268+
269+
curl http://localhost:12434/engines/llama.cpp/v1/chat/completions \
270+
-H "Content-Type: application/json" \
271+
-d '{
272+
"model": "ai/smo11m2",
273+
"messages": [
274+
{
275+
"role": "system",
276+
"content": "You are a helpful assistant."
277+
},
278+
{
279+
"role": "user",
280+
"content": "Please write 500 words about the fall of Rome."
281+
}
282+
]
283+
}'
284+
```
285+
286+
## Known issues
287+
288+
### `docker model` is not recognised
289+
290+
If you run a Docker Model Runner command and see:
291+
292+
```text
293+
docker: 'model' is not a docker command
294+
```
295+
296+
It means Docker can't find the plugin because it's not in the expected CLI plugins directory.
297+
298+
To fix this, create a symlink so Docker can detect it:
299+
300+
```console
301+
$ ln -s /Applications/Docker.app/Contents/Resources/cli-plugins/docker-model ~/.docker/cli-plugins/docker-model
302+
```
303+
304+
Once linked, re-run the command.
305+
306+
### No safeguard for running oversized models
307+
308+
Currently, Docker Model Runner doesn't include safeguards to prevent you from launching models that exceed their system’s available resources. Attempting to run a model that is too large for the host machine may result in severe slowdowns or render the system temporarily unusable. This issue is particularly common when running LLMs models without sufficient GPU memory or system RAM.
309+
310+
### `model run` drops into chat even if pull fails
311+
312+
If a model image fails to pull successfully, for example due to network issues or lack of disk space, the `docker model run` command will still drop you into the chat interface, even though the model isn’t actually available. This can lead to confusion, as the chat will not function correctly without a running model.
313+
314+
You can manually retry the `docker model pull` command to ensure the image is available before running it again.
315+
316+
### No consistent digest support in Model CLI
317+
318+
The Docker Model CLI currently lacks consistent support for specifying models by image digest. As a temporary workaround, you should refer to models by name instead of digest.
319+
320+
### Misleading pull progress after failed initial attempt
321+
322+
In some cases, if an initial `docker model pull` fails partway through, a subsequent successful pull may misleadingly report “0 bytes” downloaded even though data is being fetched in the background. This can give the impression that nothing is happening, when in fact the model is being retrieved. Despite the incorrect progress output, the pull typically completes as expected.
323+
324+
## Share feedback
325+
326+
Thanks for trying out Docker Model Runner. Give feedback or report any bugs you may find through the **Give feedback** link next to the **Enable Docker Model Runner** setting.

data/summary.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -137,6 +137,10 @@ Docker Desktop CLI logs:
137137
requires: Docker Desktop 4.39 and later
138138
Docker GitHub Copilot:
139139
availability: Early Access
140+
Docker Model Runner:
141+
availability: Beta
142+
requires: Docker Desktop 4.40 and later
143+
for: Docker Desktop for Mac with Apple Silicon
140144
Docker Projects:
141145
availability: Beta
142146
Docker Init:

0 commit comments

Comments
 (0)