Skip to content

Commit 8757036

Browse files
Merge pull request #1998 from jasonrandrews/review2
Docker Model Runner Learning Path
2 parents 5df4249 + 33b766f commit 8757036

File tree

6 files changed

+381
-0
lines changed

6 files changed

+381
-0
lines changed
Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
---
2+
title: Learn how to use Docker Model Runner in AI applications
3+
4+
draft: true
5+
cascade:
6+
draft: true
7+
8+
minutes_to_complete: 45
9+
10+
who_is_this_for: This is for software developers and AI enthusiasts who want to run AI models using Docker Model Runner.
11+
12+
learning_objectives:
13+
- Run AI models locally using Docker Model Runner.
14+
- Easily build containerized applications with LLMs.
15+
16+
prerequisites:
17+
- A computer with at least 16GB of RAM (recommended) and Docker Desktop installed (version 4.40 or later).
18+
- Basic understanding of Docker.
19+
- Familiarity with Large Language Model (LLM) concepts.
20+
21+
author: Jason Andrews
22+
23+
### Tags
24+
skilllevels: Introductory
25+
subjects: Containers and Virtualization
26+
armips:
27+
- Neoverse
28+
- Cortex-A
29+
operatingsystems:
30+
- Windows
31+
- macOS
32+
tools_software_languages:
33+
- Docker
34+
- Python
35+
- LLM
36+
37+
further_reading:
38+
- resource:
39+
title: Docker Model Runner Documentation
40+
link: https://docs.docker.com/model-runner/
41+
type: documentation
42+
- resource:
43+
title: Introducing Docker Model Runner
44+
link: https://www.docker.com/blog/introducing-docker-model-runner/
45+
type: blog
46+
47+
### FIXED, DO NOT MODIFY
48+
# ================================================================================
49+
weight: 1 # _index.md always has weight of 1 to order correctly
50+
layout: "learningpathall" # All files under learning paths have this same wrapper
51+
learning_path_main_page: "yes" # This should be surfaced when looking for related content. Only set for _index.md of learning path content.
52+
---
53+
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
---
2+
# ================================================================================
3+
# FIXED, DO NOT MODIFY THIS FILE
4+
# ================================================================================
5+
weight: 21 # Set to always be larger than the content in this path to be at the end of the navigation.
6+
title: "Next Steps" # Always the same, html page title.
7+
layout: "learningpathall" # All files under learning paths have this same wrapper for Hugo processing.
8+
---
104 KB
Loading
Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,123 @@
1+
---
2+
title: "Run a containerized AI chat app with Docker Compose"
3+
weight: 3
4+
layout: "learningpathall"
5+
---
6+
7+
Docker Compose makes it easy to run multi-container applications. Docker Compose can also include AI models in your project.
8+
9+
In this section, you'll learn how to use Docker Compose to deploy a web-based AI chat application that uses Docker Model Runner as the backend for AI inference.
10+
11+
## Clone the example project
12+
13+
The example project, named [docker-model-runner-chat](https://github.com/jasonrandrews/docker-model-runner-chat) is available on GitHub. It provides a simple web interface to interact with local AI models such as Llama 3.2 or Gemma 3.
14+
15+
First, clone the example repository:
16+
17+
```console
18+
git clone https://github.com/jasonrandrews/docker-model-runner-chat.git
19+
cd docker-model-runner-chat
20+
```
21+
22+
## Review the Docker Compose file
23+
24+
The `compose.yaml` file defines how the application is deployed using Docker Compose.
25+
26+
It sets up two services:
27+
28+
- **ai-chat**: A Flask-based web application that provides the chat user interface. It is built from the local directory, exposes port 5000 for browser access, mounts the project directory as a volume for live code updates, loads environment variables from `vars.env`, and waits for the `ai-runner` service to be ready before starting.
29+
- **ai-runner**: This service uses the Docker Model Runner provider to run the selected AI model (for example, `ai/gemma3`). The configuration under `provider` tells Docker to use the model runner extension and specifies which model to load.
30+
31+
The setup allows the web app to communicate with the model runner service as if it were an OpenAI-compatible API, making it easy to swap models or update endpoints by changing environment variables or compose options.
32+
33+
Review the `compose.yaml` file to see the two services.
34+
35+
```yaml
36+
services:
37+
ai-chat:
38+
build:
39+
context: .
40+
ports:
41+
- "5000:5000"
42+
volumes:
43+
- ./:/app
44+
env_file:
45+
- vars.env
46+
depends_on:
47+
- ai-runner
48+
ai-runner:
49+
provider:
50+
type: model
51+
options:
52+
model: ai/gemma3
53+
```
54+
55+
## Start the application
56+
57+
From the project directory, start the app with:
58+
59+
```console
60+
docker compose up --build
61+
```
62+
63+
Docker Compose will build the web app image and start both services.
64+
65+
## Access the chat interface
66+
67+
Open your browser and copy and paste the local URL below:
68+
69+
```console
70+
http://localhost:5000
71+
```
72+
73+
You can now chat with the AI model using the web interface. Enter your prompt and view the response in real time.
74+
75+
![Compose #center](compose-app.png)
76+
77+
## Configuration
78+
79+
You can change the AI model or endpoint by editing the `vars.env` file before starting the containers. The file contains environment variables used by the web application:
80+
81+
- `BASE_URL`: The base URL for the AI model API. By default, it is set to `http://model-runner.docker.internal/engines/v1/`, which allows the web app to communicate with the Docker Model Runner service. This is the default endpoint setup by Docker to access the model.
82+
- `MODEL`: The AI model to use (for example, `ai/gemma3` or `ai/llama3.2`).
83+
84+
The `vars.env` file is shown below.
85+
86+
```console
87+
BASE_URL=http://model-runner.docker.internal/engines/v1/
88+
MODEL=ai/gemma3
89+
```
90+
91+
To use a different model, change the `MODEL` value. For example:
92+
93+
```console
94+
MODEL=ai/llama3.2
95+
```
96+
97+
Make sure to change the model in the `compose.yaml` file also.
98+
99+
You can also change the `temperature` and `max_tokens` values in `app.py` to further customize the application.
100+
101+
## Stop the application
102+
103+
To stop the services, press `Ctrl+C` in the terminal.
104+
105+
You can also run the command below in another terminal to stop the services.
106+
107+
```console
108+
docker compose down
109+
```
110+
111+
## Troubleshooting
112+
113+
Use the steps below if you have any issues running the application:
114+
115+
- Ensure Docker and Docker Compose are installed and running
116+
- Make sure port 5000 is not in use by another application
117+
- Check logs with:
118+
119+
```console
120+
docker compose logs
121+
```
122+
123+
In this section, you learned how to use Docker Compose to run a containerized AI chat application with a web interface and local model inference from Docker Model Runner.
72.5 KB
Loading
Lines changed: 197 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,197 @@
1+
---
2+
title: "Run AI models using Docker Model Runner"
3+
weight: 2
4+
layout: "learningpathall"
5+
---
6+
7+
Docker Model Runner is an official Docker extension that allows you to run Large Language Models (LLMs) on your local computer. It provides a convenient way to deploy and use AI models across different environments, including Arm-based systems, without complex setup or cloud dependencies.
8+
9+
Docker uses [llama.cpp](https://github.com/ggml-org/llama.cpp), an open source C/C++ project developed by Georgi Gerganov that enables efficient LLM inference on a variety of hardware, but you do not need to download, build, or install any LLM frameworks.
10+
11+
Docker Model Runner provides a easy to use CLI that is familiar to Docker users.
12+
13+
## Before you begin
14+
15+
Verify Docker is running with:
16+
17+
```console
18+
docker version
19+
```
20+
21+
You should see output showing your Docker version.
22+
23+
Confirm the Docker Desktop version is 4.40 or above, for example:
24+
25+
```output
26+
Server: Docker Desktop 4.41.2 (191736)
27+
```
28+
29+
Make sure the Docker Model Runner is enabled.
30+
31+
```console
32+
docker model --help
33+
```
34+
35+
You should see the usage message:
36+
37+
```output
38+
Usage: docker model COMMAND
39+
40+
Docker Model Runner
41+
42+
Commands:
43+
inspect Display detailed information on one model
44+
list List the available models that can be run with the Docker Model Runner
45+
logs Fetch the Docker Model Runner logs
46+
pull Download a model
47+
push Upload a model
48+
rm Remove models downloaded from Docker Hub
49+
run Run a model with the Docker Model Runner
50+
status Check if the Docker Model Runner is running
51+
tag Tag a model
52+
version Show the Docker Model Runner version
53+
```
54+
55+
If Docker Model Runner is not enabled, enable it using the [Docker Model Runner documentation](https://docs.docker.com/model-runner/).
56+
57+
You should also see the Models icon in your Docker Desktop sidebar.
58+
59+
![Models #center](models-tab.png)
60+
61+
## Running your first AI model with Docker Model Runner
62+
63+
Docker Model Runner is an extension for Docker Desktop that simplifies running AI models locally.
64+
65+
Docker Model Runner automatically selects compatible model versions and optimizes performance for the Arm architecture.
66+
67+
You can try Docker Model Runner by using an LLM from Docker Hub.
68+
69+
The example below uses the [SmolLM2 model](https://hub.docker.com/r/ai/smollm2), a compact language model with 360 million parameters, designed to run efficiently on-device while performing a wide range of language tasks. You can explore additional [models in Docker Hub](https://hub.docker.com/u/ai).
70+
71+
Download the model using:
72+
73+
```console
74+
docker model pull ai/smollm2
75+
```
76+
77+
For a simple chat interface, run the model:
78+
79+
```console
80+
docker model run ai/smollm2
81+
```
82+
83+
Enter a prompt at the CLI:
84+
85+
```console
86+
write a simple hello world program in C++
87+
```
88+
89+
You see the output from the SmolLM2 model:
90+
91+
```output
92+
#include <iostream>
93+
94+
int main() {
95+
std::cout << "Hello, World!" << std::endl;
96+
return 0;
97+
}
98+
```
99+
100+
You can ask more questions and continue to chat.
101+
102+
To exit the chat use the `/bye` command.
103+
104+
You can print the list of models on your computer using:
105+
106+
```console
107+
docker model list
108+
```
109+
110+
Your list will be different based on the models you have downloaded.
111+
112+
```output
113+
MODEL NAME PARAMETERS QUANTIZATION ARCHITECTURE MODEL ID CREATED SIZE
114+
ai/gemma3 3.88 B IQ2_XXS/Q4_K_M gemma3 0b329b335467 2 months ago 2.31 GiB
115+
ai/phi4 14.66 B IQ2_XXS/Q4_K_M phi3 03c0bc8e0f5a 2 months ago 8.43 GiB
116+
ai/smollm2 361.82 M IQ2_XXS/Q4_K_M llama 354bf30d0aa3 2 months ago 256.35 MiB
117+
ai/llama3.2 3.21 B IQ2_XXS/Q4_K_M llama 436bb282b419 2 months ago 1.87 GiB
118+
```
119+
120+
## Use the OpenAI endpoint to call the model
121+
122+
From your host computer you can access the model using the OpenAI endpoint and a TCP port.
123+
124+
First, enable the TCP port to connect with the model:
125+
126+
```console
127+
docker desktop enable model-runner --tcp 12434
128+
```
129+
130+
Next, use a text editor to save the code below in a file named `curl-test.sh`:
131+
132+
```bash
133+
#!/bin/sh
134+
135+
curl http://localhost:12434/engines/llama.cpp/v1/chat/completions \
136+
-H "Content-Type: application/json" \
137+
-d '{
138+
"model": "ai/smollm2",
139+
"messages": [
140+
{
141+
"role": "system",
142+
"content": "You are a helpful assistant."
143+
},
144+
{
145+
"role": "user",
146+
"content": "Please write a hello world program in Java."
147+
}
148+
]
149+
}'
150+
```
151+
152+
Run the shell script:
153+
154+
```console
155+
bash ./curl-test.sh | jq
156+
```
157+
158+
If you don't have `jq` installed, you eliminate piping the output.
159+
160+
The output, including the performance information, is shown below:
161+
162+
```output
163+
{
164+
"choices": [
165+
{
166+
"finish_reason": "stop",
167+
"index": 0,
168+
"message": {
169+
"role": "assistant",
170+
"content": "Here's a simple \"Hello World\" program in Java:\n\n```java\npublic class HelloWorld {\n public static void main(String[] args) {\n System.out.println(\"Hello, World!\");\n }\n}\n```\n\nThis program declares a `HelloWorld` class, defines a `main` method that contains the program's execution, and then uses `System.out.println` to print \"Hello, World!\" to the console."
171+
}
172+
}
173+
],
174+
"created": 1748622685,
175+
"model": "ai/smollm2",
176+
"system_fingerprint": "b1-a0f7016",
177+
"object": "chat.completion",
178+
"usage": {
179+
"completion_tokens": 101,
180+
"prompt_tokens": 28,
181+
"total_tokens": 129
182+
},
183+
"id": "chatcmpl-uZGBuFoS2ERodT4KilStxDwhySLQBTN9",
184+
"timings": {
185+
"prompt_n": 28,
186+
"prompt_ms": 32.349,
187+
"prompt_per_token_ms": 1.1553214285714284,
188+
"prompt_per_second": 865.5599863983431,
189+
"predicted_n": 101,
190+
"predicted_ms": 469.524,
191+
"predicted_per_token_ms": 4.648752475247525,
192+
"predicted_per_second": 215.11147459980745
193+
}
194+
}
195+
```
196+
197+
In this section you learned how to run AI models using Docker Model Runner. Continue to see how to use Docker Compose to build an application with a built-in AI model.

0 commit comments

Comments
 (0)