Skip to content

Commit 0c69fcb

Browse files
DMR: clarify base urls (#22623)
<!--Delete sections as needed --> ## Description Clarify base urls, reorder examples by order of importance. ## Related issues or tickets <!-- Related issues, pull requests, or Jira tickets --> ## Reviews <!-- Notes for reviewers here --> <!-- List applicable reviews (optionally @tag reviewers) --> - [x] Technical review - [x] Editorial review - [ ] Product review --------- Co-authored-by: Allie Sadler <[email protected]>
1 parent 446850c commit 0c69fcb

File tree

1 file changed

+59
-51
lines changed

1 file changed

+59
-51
lines changed

content/manuals/ai/model-runner.md

Lines changed: 59 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,9 @@ params:
77
text: Beta
88
group: AI
99
weight: 20
10-
description: Learn how to use Docker Model Runner to manage and run AI models.
10+
description: Learn how to use Docker Model Runner to manage and run AI models.
1111
keywords: Docker, ai, model runner, docker deskotp, llm
12-
aliases:
12+
aliases:
1313
- /desktop/features/model-runner/
1414
- /ai/model-runner/
1515
---
@@ -34,8 +34,8 @@ Models are pulled from Docker Hub the first time they're used and stored locally
3434

3535
1. Navigate to the **Features in development** tab in settings.
3636
2. Under the **Experimental features** tab, select **Access experimental features**.
37-
3. Select **Apply and restart**.
38-
4. Quit and reopen Docker Desktop to ensure the changes take effect.
37+
3. Select **Apply and restart**.
38+
4. Quit and reopen Docker Desktop to ensure the changes take effect.
3939
5. Open the **Settings** view in Docker Desktop.
4040
6. Navigate to **Features in development**.
4141
7. From the **Beta** tab, check the **Enable Docker Model Runner** setting.
@@ -46,7 +46,7 @@ You can now use the `docker model` command in the CLI and view and interact with
4646

4747
### Model runner status
4848

49-
Check whether the Docker Model Runner is active:
49+
Check whether the Docker Model Runner is active and displays the current inference engine:
5050

5151
```console
5252
$ docker model status
@@ -55,7 +55,7 @@ $ docker model status
5555
### View all commands
5656

5757
Displays help information and a list of available subcommands.
58-
58+
5959
```console
6060
$ docker model help
6161
```
@@ -74,15 +74,15 @@ Commands:
7474
version Show the current version
7575
```
7676

77-
### Pull a model
77+
### Pull a model
7878

7979
Pulls a model from Docker Hub to your local environment.
8080

8181
```console
8282
$ docker model pull <model>
8383
```
8484

85-
Example:
85+
Example:
8686

8787
```console
8888
$ docker model pull ai/smollm2
@@ -114,7 +114,13 @@ You will see something similar to:
114114

115115
### Run a model
116116

117-
Run a model and interact with it using a submitted prompt or in chat mode.
117+
Run a model and interact with it using a submitted prompt or in chat mode. When you run a model, Docker
118+
calls an Inference Server API endpoint hosted by the Model Runner through Docker Desktop. The model
119+
stays in memory until another model is requested, or until a pre-defined inactivity timeout is reached (currently 5 minutes).
120+
121+
You do not have to use `Docker model run` before interacting with a specific model from a
122+
host process or from within a container. Model Runner transparently loads the requested model on-demand, assuming it has been
123+
pulled beforehand and is locally available.
118124

119125
#### One-time prompt
120126

@@ -150,18 +156,18 @@ Chat session ended.
150156
151157
### Push a model to Docker Hub
152158

153-
Use the following command to push your model to Docker Hub:
159+
To push your model to Docker Hub:
154160

155161
```console
156162
$ docker model push <namespace>/<model>
157163
```
158164

159165
### Tag a model
160166

161-
You can specify a particular version or variant of the model:
167+
To specify a particular version or variant of the model:
162168

163169
```console
164-
$ docker model tag
170+
$ docker model tag
165171
```
166172

167173
If no tag is provided, Docker defaults to `latest`.
@@ -171,7 +177,7 @@ If no tag is provided, Docker defaults to `latest`.
171177
Fetch logs from Docker Model Runner to monitor activity or debug issues.
172178

173179
```console
174-
$ docker model logs
180+
$ docker model logs
175181
```
176182

177183
The following flags are accepted:
@@ -211,53 +217,54 @@ If you want to try an existing GenAI application, follow these instructions.
211217

212218
4. Open you app in the browser at the addresses specified in the repository [README](https://github.com/docker/hello-genai).
213219

214-
You'll see the GenAI app's interface where you can start typing your prompts.
220+
You'll see the GenAI app's interface where you can start typing your prompts.
215221

216222
You can now interact with your own GenAI app, powered by a local model. Try a few prompts and notice how fast the responses are — all running on your machine with Docker.
217223

218224
## FAQs
219225

220226
### What models are available?
221227

222-
All the available models are hosted in the [public Docker Hub namespace of `ai`](https://hub.docker.com/u/ai).
228+
All the available models are hosted in the [public Docker Hub namespace of `ai`](https://hub.docker.com/u/ai).
223229

224230
### What API endpoints are available?
225231

226-
Once the feature is enabled, the following new APIs are available:
232+
Once the feature is enabled, new API endpoints are available under the following base URLs:
227233

228-
```text
229-
#### Inside containers ####
234+
- From containers: `http://model-runner.docker.internal/`
235+
- From host processes: `http://localhost:12434/`, assuming you have enabled TCP host access on default port 12434.
230236

231-
http://model-runner.docker.internal/
237+
Docker Model management endpoints:
232238

233-
# Docker Model management
234-
POST /models/create
235-
GET /models
236-
GET /models/{namespace}/{name}
237-
DELETE /models/{namespace}/{name}
239+
```text
240+
POST /models/create
241+
GET /models
242+
GET /models/{namespace}/{name}
243+
DELETE /models/{namespace}/{name}
244+
```
238245

239-
# OpenAI endpoints
240-
GET /engines/llama.cpp/v1/models
241-
GET /engines/llama.cpp/v1/models/{namespace}/{name}
242-
POST /engines/llama.cpp/v1/chat/completions
243-
POST /engines/llama.cpp/v1/completions
244-
POST /engines/llama.cpp/v1/embeddings
245-
Note: You can also omit llama.cpp.
246-
E.g., POST /engines/v1/chat/completions.
246+
OpenAI endpoints:
247247

248-
#### Inside or outside containers (host) ####
248+
```text
249+
GET /engines/llama.cpp/v1/models
250+
GET /engines/llama.cpp/v1/models/{namespace}/{name}
251+
POST /engines/llama.cpp/v1/chat/completions
252+
POST /engines/llama.cpp/v1/completions
253+
POST /engines/llama.cpp/v1/embeddings
254+
```
249255

250-
Same endpoints on /var/run/docker.sock
256+
To call these endpoints via a Unix socket (`/var/run/docker.sock`), prefix their path with
257+
with `/exp/vDD4.40`.
258+
259+
> [!NOTE]
260+
> You can omit `llama.cpp` from the path. For example: `POST /engines/v1/chat/completions`.
251261
252-
# While still in Beta
253-
Prefixed with /exp/vDD4.40
254-
```
255262

256263
### How do I interact through the OpenAI API?
257264

258265
#### From within a container
259266

260-
Examples of calling an OpenAI endpoint (`chat/completions`) from within another container using `curl`:
267+
To call the `chat/completions` OpenAI endpoint from within another container using `curl`:
261268

262269
```bash
263270
#!/bin/sh
@@ -280,15 +287,18 @@ curl http://model-runner.docker.internal/engines/llama.cpp/v1/chat/completions \
280287

281288
```
282289

283-
#### From the host using a Unix socket
290+
#### From the host using TCP
284291

285-
Examples of calling an OpenAI endpoint (`chat/completions`) through the Docker socket from the host using `curl`:
292+
To call the `chat/completions` OpenAI endpoint from the host via TCP:
293+
294+
1. Enable the host-side TCP support from the Docker Desktop GUI, or via the [Docker Desktop CLI](/manuals/desktop/features/desktop-cli.md).
295+
For example: `docker desktop enable model-runner --tcp <port>`.
296+
2. Interact with it as documented in the previous section using `localhost` and the correct port.
286297

287298
```bash
288299
#!/bin/sh
289300

290-
curl --unix-socket $HOME/.docker/run/docker.sock \
291-
localhost/exp/vDD4.40/engines/llama.cpp/v1/chat/completions \
301+
curl http://localhost:12434/engines/llama.cpp/v1/chat/completions \
292302
-H "Content-Type: application/json" \
293303
-d '{
294304
"model": "ai/smollm2",
@@ -303,19 +313,17 @@ curl --unix-socket $HOME/.docker/run/docker.sock \
303313
}
304314
]
305315
}'
306-
307316
```
308317

309-
#### From the host using TCP
310-
311-
In case you want to interact with the API from the host, but use TCP instead of a Docker socket, you can enable the host-side TCP support from the Docker Desktop GUI, or via the [Docker Desktop CLI](/manuals/desktop/features/desktop-cli.md). For example, using `docker desktop enable model-runner --tcp <port>`.
318+
#### From the host using a Unix socket
312319

313-
Afterwards, interact with it as previously documented using `localhost` and the chosen, or the default port.
320+
To call the `chat/completions` OpenAI endpoint through the Docker socket from the host using `curl`:
314321

315322
```bash
316323
#!/bin/sh
317324

318-
curl http://localhost:12434/engines/llama.cpp/v1/chat/completions \
325+
curl --unix-socket $HOME/.docker/run/docker.sock \
326+
localhost/exp/vDD4.40/engines/llama.cpp/v1/chat/completions \
319327
-H "Content-Type: application/json" \
320328
-d '{
321329
"model": "ai/smollm2",
@@ -354,15 +362,15 @@ Once linked, re-run the command.
354362

355363
### No safeguard for running oversized models
356364

357-
Currently, Docker Model Runner doesn't include safeguards to prevent you from launching models that exceed their systems available resources. Attempting to run a model that is too large for the host machine may result in severe slowdowns or render the system temporarily unusable. This issue is particularly common when running LLMs models without sufficient GPU memory or system RAM.
365+
Currently, Docker Model Runner doesn't include safeguards to prevent you from launching models that exceed their system's available resources. Attempting to run a model that is too large for the host machine may result in severe slowdowns or render the system temporarily unusable. This issue is particularly common when running LLMs models without sufficient GPU memory or system RAM.
358366

359367
### No consistent digest support in Model CLI
360368

361369
The Docker Model CLI currently lacks consistent support for specifying models by image digest. As a temporary workaround, you should refer to models by name instead of digest.
362370

363371
## Share feedback
364372

365-
Thanks for trying out Docker Model Runner. Give feedback or report any bugs you may find through the **Give feedback** link next to the **Enable Docker Model Runner** setting.
373+
Thanks for trying out Docker Model Runner. Give feedback or report any bugs you may find through the **Give feedback** link next to the **Enable Docker Model Runner** setting.
366374

367375
## Disable the feature
368376

@@ -371,4 +379,4 @@ To disable Docker Model Runner:
371379
1. Open the **Settings** view in Docker Desktop.
372380
2. Navigate to the **Beta** tab in **Features in development**.
373381
3. Clear the **Enable Docker Model Runner** checkbox.
374-
4. Select **Apply & restart**.
382+
4. Select **Apply & restart**.

0 commit comments

Comments
 (0)