You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/manuals/ai/model-runner.md
+1-186Lines changed: 1 addition & 186 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -76,191 +76,6 @@ You can now use the `docker model` command in the CLI and view and interact with
76
76
$ docker model run ai/smollm2
77
77
```
78
78
79
-
## Available commands
80
-
81
-
### Model runner status
82
-
83
-
Check whether the Docker Model Runner is active and displays the current inference engine:
84
-
85
-
```console
86
-
$ docker model status
87
-
```
88
-
89
-
### View all commands
90
-
91
-
Displays help information and a list of available subcommands.
92
-
93
-
```console
94
-
$ docker model help
95
-
```
96
-
97
-
Output:
98
-
99
-
```text
100
-
Usage: docker model COMMAND
101
-
102
-
Commands:
103
-
list List models available locally
104
-
pull Download a model from Docker Hub
105
-
rm Remove a downloaded model
106
-
run Run a model interactively or with a prompt
107
-
status Check if the model runner is running
108
-
version Show the current version
109
-
```
110
-
111
-
### Pull a model
112
-
113
-
Pulls a model from Docker Hub to your local environment.
114
-
115
-
```console
116
-
$ docker model pull <model>
117
-
```
118
-
119
-
Example:
120
-
121
-
```console
122
-
$ docker model pull ai/smollm2
123
-
```
124
-
125
-
Output:
126
-
127
-
```text
128
-
Downloaded: 257.71 MB
129
-
Model ai/smollm2 pulled successfully
130
-
```
131
-
132
-
The models also display in the Docker Desktop Dashboard.
133
-
134
-
#### Pull from Hugging Face
135
-
136
-
You can also pull GGUF models directly from [Hugging Face](https://huggingface.co/models?library=gguf).
137
-
138
-
```console
139
-
$ docker model pull hf.co/<model-you-want-to-pull>
140
-
```
141
-
142
-
For example:
143
-
144
-
```console
145
-
$ docker model pull hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF
146
-
```
147
-
148
-
Pulls the [bartowski/Llama-3.2-1B-Instruct-GGUF](https://huggingface.co/bartowski/Llama-3.2-1B-Instruct-GGUF).
149
-
150
-
### List available models
151
-
152
-
Lists all models currently pulled to your local environment.
153
-
154
-
```console
155
-
$ docker model list
156
-
```
157
-
158
-
You will see something similar to:
159
-
160
-
```text
161
-
+MODEL PARAMETERS QUANTIZATION ARCHITECTURE MODEL ID CREATED SIZE
162
-
+ai/smollm2 361.82 M IQ2_XXS/Q4_K_M llama 354bf30d0aa3 3 days ago 256.35 MiB
163
-
```
164
-
165
-
### Run a model
166
-
167
-
Run a model and interact with it using a submitted prompt or in chat mode. When you run a model, Docker
168
-
calls an Inference Server API endpoint hosted by the Model Runner through Docker Desktop. The model
169
-
stays in memory until another model is requested, or until a pre-defined inactivity timeout is reached (currently 5 minutes).
170
-
171
-
You do not have to use `Docker model run` before interacting with a specific model from a
172
-
host process or from within a container. Model Runner transparently loads the requested model on-demand, assuming it has been
173
-
pulled beforehand and is locally available.
174
-
175
-
#### One-time prompt
176
-
177
-
```console
178
-
$ docker model run ai/smollm2 "Hi"
179
-
```
180
-
181
-
Output:
182
-
183
-
```text
184
-
Hello! How can I assist you today?
185
-
```
186
-
187
-
#### Interactive chat
188
-
189
-
```console
190
-
$ docker model run ai/smollm2
191
-
```
192
-
193
-
Output:
194
-
195
-
```text
196
-
Interactive chat mode started. Type '/bye' to exit.
197
-
> Hi
198
-
Hi there! It's SmolLM, AI assistant. How can I help you today?
199
-
> /bye
200
-
Chat session ended.
201
-
```
202
-
203
-
> [!TIP]
204
-
>
205
-
> You can also use chat mode in the Docker Desktop Dashboard when you select the model in the **Models** tab.
206
-
207
-
### Push a model to Docker Hub
208
-
209
-
To push your model to Docker Hub:
210
-
211
-
```console
212
-
$ docker model push <namespace>/<model>
213
-
```
214
-
215
-
### Tag a model
216
-
217
-
To specify a particular version or variant of the model:
218
-
219
-
```console
220
-
$ docker model tag
221
-
```
222
-
223
-
If no tag is provided, Docker defaults to `latest`.
224
-
225
-
### View the logs
226
-
227
-
Fetch logs from Docker Model Runner to monitor activity or debug issues.
228
-
229
-
```console
230
-
$ docker model logs
231
-
```
232
-
233
-
The following flags are accepted:
234
-
235
-
-`-f`/`--follow`: View logs with real-time streaming
236
-
-`--no-engines`: Exclude inference engine logs from the output
237
-
238
-
### Remove a model
239
-
240
-
Removes a downloaded model from your system.
241
-
242
-
```console
243
-
$ docker model rm <model>
244
-
```
245
-
246
-
Output:
247
-
248
-
```text
249
-
Model <model> removed successfully
250
-
```
251
-
252
-
### Package a model
253
-
254
-
Packages a GGUF file into a Docker model OCI artifact, with optional licenses, and pushes it to the specified registry.
255
-
256
-
```console
257
-
$ docker model package \
258
-
--gguf ./model.gguf \
259
-
--licenses license1.txt \
260
-
--licenses license2.txt \
261
-
--push registry.example.com/ai/custom-model
262
-
```
263
-
264
79
## Integrate the Docker Model Runner into your software development lifecycle
265
80
266
81
You can now start building your Generative AI application powered by the Docker Model Runner.
@@ -428,7 +243,7 @@ Once linked, re-run the command.
428
243
429
244
### No safeguard for running oversized models
430
245
431
-
Currently, Docker Model Runner doesn't include safeguards to prevent you from launching models that exceed your system’s available resources. Attempting to run a model that is too large for the host machine may result in severe slowdowns or render the system temporarily unusable. This issue is particularly common when running LLMs models without sufficient GPU memory or system RAM.
246
+
Currently, Docker Model Runner doesn't include safeguards to prevent you from launching models that exceed their system's available resources. Attempting to run a model that is too large for the host machine may result in severe slowdowns or render the system temporarily unusable. This issue is particularly common when running LLMs models without sufficient GPU memory or system RAM.
0 commit comments