|
| 1 | +--- |
| 2 | +title: "Run AI models using Docker Model Runner" |
| 3 | +weight: 2 |
| 4 | +layout: "learningpathall" |
| 5 | +--- |
| 6 | + |
| 7 | +Docker Model Runner is an official Docker extension that allows you to run Large Language Models (LLMs) on your local computer. It provides a convenient way to deploy and use AI models across different environments, including Arm-based systems, without complex setup or cloud dependencies. |
| 8 | + |
| 9 | +Docker uses [llama.cpp](https://github.com/ggml-org/llama.cpp), an open source C/C++ project developed by Georgi Gerganov that enables efficient LLM inference on a variety of hardware, but you do not need to download, build, or install any LLM frameworks. |
| 10 | + |
| 11 | +Docker Model Runner provides a easy to use CLI that is familiar to Docker users. |
| 12 | + |
| 13 | +## Before you begin |
| 14 | + |
| 15 | +Verify Docker is running with: |
| 16 | + |
| 17 | +```console |
| 18 | +docker version |
| 19 | +``` |
| 20 | + |
| 21 | +You should see output showing your Docker version. |
| 22 | + |
| 23 | +Confirm the Docker Desktop version is 4.40 or above, for example: |
| 24 | + |
| 25 | +```output |
| 26 | +Server: Docker Desktop 4.41.2 (191736) |
| 27 | +``` |
| 28 | + |
| 29 | +Make sure the Docker Model Runner is enabled. |
| 30 | + |
| 31 | +```console |
| 32 | +docker model --help |
| 33 | +``` |
| 34 | + |
| 35 | +You should see the usage message: |
| 36 | + |
| 37 | +```output |
| 38 | +Usage: docker model COMMAND |
| 39 | +
|
| 40 | +Docker Model Runner |
| 41 | +
|
| 42 | +Commands: |
| 43 | + inspect Display detailed information on one model |
| 44 | + list List the available models that can be run with the Docker Model Runner |
| 45 | + logs Fetch the Docker Model Runner logs |
| 46 | + pull Download a model |
| 47 | + push Upload a model |
| 48 | + rm Remove models downloaded from Docker Hub |
| 49 | + run Run a model with the Docker Model Runner |
| 50 | + status Check if the Docker Model Runner is running |
| 51 | + tag Tag a model |
| 52 | + version Show the Docker Model Runner version |
| 53 | +``` |
| 54 | + |
| 55 | +If Docker Model Runner is not enabled, enable it using the [Docker Model Runner documentation](https://docs.docker.com/model-runner/). |
| 56 | + |
| 57 | +You should also see the Models icon in your Docker Desktop sidebar. |
| 58 | + |
| 59 | + |
| 60 | + |
| 61 | +## Running your first AI model with Docker Model Runner |
| 62 | + |
| 63 | +Docker Model Runner is an extension for Docker Desktop that simplifies running AI models locally. |
| 64 | + |
| 65 | +Docker Model Runner automatically selects compatible model versions and optimizes performance for the Arm architecture. |
| 66 | + |
| 67 | +You can try Docker Model Runner by using an LLM from Docker Hub. |
| 68 | + |
| 69 | +The example below uses the [SmolLM2 model](https://hub.docker.com/r/ai/smollm2), a compact language model with 360 million parameters, designed to run efficiently on-device while performing a wide range of language tasks. You can explore additional [models in Docker Hub](https://hub.docker.com/u/ai). |
| 70 | + |
| 71 | +Download the model using: |
| 72 | + |
| 73 | +```console |
| 74 | +docker model pull ai/smollm2 |
| 75 | +``` |
| 76 | + |
| 77 | +For a simple chat interface, run the model: |
| 78 | + |
| 79 | +```console |
| 80 | +docker model run ai/smollm2 |
| 81 | +``` |
| 82 | + |
| 83 | +Enter a prompt at the CLI: |
| 84 | + |
| 85 | +```console |
| 86 | +write a simple hello world program in C++ |
| 87 | +``` |
| 88 | + |
| 89 | +You see the output from the SmolLM2 model: |
| 90 | + |
| 91 | +```output |
| 92 | +#include <iostream> |
| 93 | +
|
| 94 | +int main() { |
| 95 | + std::cout << "Hello, World!" << std::endl; |
| 96 | + return 0; |
| 97 | +} |
| 98 | +``` |
| 99 | + |
| 100 | +You can ask more questions and continue to chat. |
| 101 | + |
| 102 | +To exit the chat use the `/bye` command. |
| 103 | + |
| 104 | +You can print the list of models on your computer using: |
| 105 | + |
| 106 | +```console |
| 107 | +docker model list |
| 108 | +``` |
| 109 | + |
| 110 | +Your list will be different based on the models you have downloaded. |
| 111 | + |
| 112 | +```output |
| 113 | +MODEL NAME PARAMETERS QUANTIZATION ARCHITECTURE MODEL ID CREATED SIZE |
| 114 | +ai/gemma3 3.88 B IQ2_XXS/Q4_K_M gemma3 0b329b335467 2 months ago 2.31 GiB |
| 115 | +ai/phi4 14.66 B IQ2_XXS/Q4_K_M phi3 03c0bc8e0f5a 2 months ago 8.43 GiB |
| 116 | +ai/smollm2 361.82 M IQ2_XXS/Q4_K_M llama 354bf30d0aa3 2 months ago 256.35 MiB |
| 117 | +ai/llama3.2 3.21 B IQ2_XXS/Q4_K_M llama 436bb282b419 2 months ago 1.87 GiB |
| 118 | +``` |
| 119 | + |
| 120 | +## Use the OpenAI endpoint to call the model |
| 121 | + |
| 122 | +From your host computer you can access the model using the OpenAI endpoint and a TCP port. |
| 123 | + |
| 124 | +First, enable the TCP port to connect with the model: |
| 125 | + |
| 126 | +```console |
| 127 | +docker desktop enable model-runner --tcp 12434 |
| 128 | +``` |
| 129 | + |
| 130 | +Next, use a text editor to save the code below in a file named `curl-test.sh`: |
| 131 | + |
| 132 | +```bash |
| 133 | +#!/bin/sh |
| 134 | + |
| 135 | +curl http://localhost:12434/engines/llama.cpp/v1/chat/completions \ |
| 136 | + -H "Content-Type: application/json" \ |
| 137 | + -d '{ |
| 138 | + "model": "ai/smollm2", |
| 139 | + "messages": [ |
| 140 | + { |
| 141 | + "role": "system", |
| 142 | + "content": "You are a helpful assistant." |
| 143 | + }, |
| 144 | + { |
| 145 | + "role": "user", |
| 146 | + "content": "Please write a hello world program in Java." |
| 147 | + } |
| 148 | + ] |
| 149 | + }' |
| 150 | +``` |
| 151 | + |
| 152 | +Run the shell script: |
| 153 | + |
| 154 | +```console |
| 155 | +bash ./curl-test.sh | jq |
| 156 | +``` |
| 157 | + |
| 158 | +If you don't have `jq` installed, you eliminate piping the output. |
| 159 | + |
| 160 | +The output, including the performance information, is shown below: |
| 161 | + |
| 162 | +```output |
| 163 | +{ |
| 164 | + "choices": [ |
| 165 | + { |
| 166 | + "finish_reason": "stop", |
| 167 | + "index": 0, |
| 168 | + "message": { |
| 169 | + "role": "assistant", |
| 170 | + "content": "Here's a simple \"Hello World\" program in Java:\n\n```java\npublic class HelloWorld {\n public static void main(String[] args) {\n System.out.println(\"Hello, World!\");\n }\n}\n```\n\nThis program declares a `HelloWorld` class, defines a `main` method that contains the program's execution, and then uses `System.out.println` to print \"Hello, World!\" to the console." |
| 171 | + } |
| 172 | + } |
| 173 | + ], |
| 174 | + "created": 1748622685, |
| 175 | + "model": "ai/smollm2", |
| 176 | + "system_fingerprint": "b1-a0f7016", |
| 177 | + "object": "chat.completion", |
| 178 | + "usage": { |
| 179 | + "completion_tokens": 101, |
| 180 | + "prompt_tokens": 28, |
| 181 | + "total_tokens": 129 |
| 182 | + }, |
| 183 | + "id": "chatcmpl-uZGBuFoS2ERodT4KilStxDwhySLQBTN9", |
| 184 | + "timings": { |
| 185 | + "prompt_n": 28, |
| 186 | + "prompt_ms": 32.349, |
| 187 | + "prompt_per_token_ms": 1.1553214285714284, |
| 188 | + "prompt_per_second": 865.5599863983431, |
| 189 | + "predicted_n": 101, |
| 190 | + "predicted_ms": 469.524, |
| 191 | + "predicted_per_token_ms": 4.648752475247525, |
| 192 | + "predicted_per_second": 215.11147459980745 |
| 193 | + } |
| 194 | +} |
| 195 | +``` |
| 196 | + |
| 197 | +In this section you learned how to run AI models using Docker Model Runner. Continue to see how to use Docker Compose to build an application with a built-in AI model. |
0 commit comments