You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/learning-paths/laptops-and-desktops/docker-models/compose.md
+21-17Lines changed: 21 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,15 +4,13 @@ weight: 3
4
4
layout: "learningpathall"
5
5
---
6
6
7
-
Docker Compose makes it easy to run multi-container applications. Docker Compose can also include AI models in your project.
7
+
Docker Compose makes it easy to run multi-container applications, and it can also include those that include local AI inference services.
8
8
9
-
In this section, you'll learn how to use Docker Compose to deploy a web-based AI chat application that uses Docker Model Runner as the backend for AI inference.
9
+
In this section, you'll use Docker Compose to deploy a simple web-based AI chat application. The frontend is a Flask web app, and the backend uses Docker Model Runner to serve AI responses.
10
10
11
11
## Clone the example project
12
12
13
-
The example project, named [docker-model-runner-chat](https://github.com/jasonrandrews/docker-model-runner-chat) is available on GitHub. It provides a simple web interface to interact with local AI models such as Llama 3.2 or Gemma 3.
14
-
15
-
First, clone the example repository:
13
+
Clone the [docker-model-runner-chat](https://github.com/jasonrandrews/docker-model-runner-chat) repository from GitHub. This project provides a simple web interface to interact with local AI models such as Llama 3.2 or Gemma 3.
The `compose.yaml` file defines how the application is deployed using Docker Compose.
22
+
The `compose.yaml` file defines defines how Docker Compose sets up and connects the services.
25
23
26
24
It sets up two services:
27
25
@@ -60,21 +58,21 @@ From the project directory, start the app with:
60
58
docker compose up --build
61
59
```
62
60
63
-
Docker Compose will build the web app image and start both services.
61
+
Docker Compose builds the web app image and starts both services.
64
62
65
63
## Access the chat interface
66
64
67
-
Open your browser and copyandpaste the local URL below:
65
+
Once running, open your browser and copy-and-paste the local URL below:
68
66
69
67
```console
70
68
http://localhost:5000
71
69
```
72
70
73
-
You can now chat with the AI model using the web interface. Enter your prompt and view the response in real time.
71
+
You’ll see a simple chat UI. Enter a prompt and get real-time responses from the AI model.
74
72
75
-

73
+

76
74
77
-
## Configuration
75
+
## Configure the model
78
76
79
77
You can change the AI model or endpoint by editing the `vars.env` file before starting the containers. The file contains environment variables used by the web application:
To use a different model, change the `MODEL` value. For example:
89
+
To use a different model or API endpoint, change the `MODEL` value. For example:
92
90
93
91
```console
94
92
MODEL=ai/llama3.2
95
93
```
96
94
97
-
Make sure to change the model in the `compose.yaml` file also.
95
+
Be sure to also update the model name in the `compose.yaml` under the `ai-runner` service.
96
+
97
+
## Optional: customize generation parameters
98
+
99
+
You can edit `app.py` to adjust parameters such as:
98
100
99
-
You can also change the `temperature` and `max_tokens` values in `app.py` to further customize the application.
101
+
*`temperature`: controls randomness (higher is more creative)
102
+
*`max_tokens`: controls the length of responses
100
103
101
104
## Stop the application
102
105
@@ -112,12 +115,13 @@ docker compose down
112
115
113
116
Use the steps below if you have any issues running the application:
114
117
115
-
- Ensure Docker and Docker Compose are installed and running
116
-
- Make sure port 5000 is not in use by another application
117
-
- Check logs with:
118
+
* Ensure Docker and Docker Compose are installed and running
119
+
* Make sure port 5000 is not in use by another application
120
+
* Check logs with:
118
121
119
122
```console
120
123
docker compose logs
121
124
```
122
125
126
+
## What you've learned
123
127
In this section, you learned how to use Docker Compose to run a containerized AI chat application with a web interface and local model inference from Docker Model Runner.
Copy file name to clipboardExpand all lines: content/learning-paths/laptops-and-desktops/docker-models/models.md
+32-19Lines changed: 32 additions & 19 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,11 +4,13 @@ weight: 2
4
4
layout: "learningpathall"
5
5
---
6
6
7
-
Docker Model Runner is an official Docker extension that allows you to run Large Language Models (LLMs) on your local computer. It provides a convenient way to deploy and use AI models across different environments, including Arm-based systems, without complex setup or cloud dependencies.
7
+
## Simplified Local LLM Inference
8
+
9
+
Docker Model Runner is an official Docker extension that allows you to run Large Language Models (LLMs) directly on your local computer. It provides a convenient way to deploy and use AI models across different environments, including Arm-based systems, without complex framework setup or cloud dependencies.
8
10
9
11
Docker uses [llama.cpp](https://github.com/ggml-org/llama.cpp), an open source C/C++ project developed by Georgi Gerganov that enables efficient LLM inference on a variety of hardware, but you do not need to download, build, or install any LLM frameworks.
10
12
11
-
Docker Model Runner provides a easy to use CLI that is familiar to Docker users.
13
+
Docker Model Runner provides a easy-to-use CLI interface that is familiar to Docker users.
12
14
13
15
## Before you begin
14
16
@@ -18,21 +20,21 @@ Verify Docker is running with:
18
20
docker version
19
21
```
20
22
21
-
You should see output showing your Docker version.
23
+
You should see your Docker version shown in the output.
22
24
23
-
Confirm the Docker Desktop version is 4.40 or above, for example:
25
+
Confirm that Docker Desktop is version 4.40 or above, for example:
24
26
25
27
```output
26
28
Server: Docker Desktop 4.41.2 (191736)
27
29
```
28
30
29
-
Make sure the Docker Model Runner is enabled.
31
+
Make sure the Docker Model Runner is enabled:
30
32
31
33
```console
32
34
docker model --help
33
35
```
34
36
35
-
You should see the usage message:
37
+
You should see this output:
36
38
37
39
```output
38
40
Usage: docker model COMMAND
@@ -52,27 +54,28 @@ Commands:
52
54
version Show the Docker Model Runner version
53
55
```
54
56
55
-
If Docker Model Runner is not enabled, enable it using the [Docker Model Runner documentation](https://docs.docker.com/model-runner/).
57
+
If Docker Model Runner is not enabled, enable it by following the [Docker Model Runner documentation](https://docs.docker.com/model-runner/).
56
58
57
-
You should also see the Modelsicon in your Docker Desktop sidebar.
59
+
You should also see the **Models** tab and icon appear in your Docker Desktop sidebar.
## Running your first AI model with Docker Model Runner
63
+
## Run your first AI model with Docker Model Runner
62
64
63
65
Docker Model Runner is an extension for Docker Desktop that simplifies running AI models locally.
64
66
65
67
Docker Model Runner automatically selects compatible model versions and optimizes performance for the Arm architecture.
66
68
67
-
You can try Docker Model Runner by using an LLM from Docker Hub.
69
+
You can try Model Runner by downloading and running a model from Docker Hub.
68
70
69
-
The example below uses the [SmolLM2 model](https://hub.docker.com/r/ai/smollm2), a compact language model with 360 million parameters, designed to run efficiently on-device while performing a wide range of language tasks. You can explore additional [models in Docker Hub](https://hub.docker.com/u/ai).
71
+
The example below uses the [SmolLM2 model](https://hub.docker.com/r/ai/smollm2), a compact LLM with ~360 million parameters, designed for efficient on-device inference while performing a wide range of language tasks. You can explore further models in [Docker Hub](https://hub.docker.com/u/ai).
70
72
71
-
Download the model using:
73
+
1.Download the model
72
74
73
75
```console
74
76
docker model pull ai/smollm2
75
77
```
78
+
2. Run the model interactively
76
79
77
80
For a simple chat interface, run the model:
78
81
@@ -96,10 +99,9 @@ int main() {
96
99
return 0;
97
100
}
98
101
```
102
+
To exit the chat, use the `/bye` command.
99
103
100
-
You can ask more questions and continue to chat.
101
-
102
-
To exit the chat use the `/bye` command.
104
+
3. View downloaded models
103
105
104
106
You can print the list of models on your computer using:
105
107
@@ -119,7 +121,9 @@ ai/llama3.2 3.21 B IQ2_XXS/Q4_K_M llama 436bb282b419 2 months ag
119
121
120
122
## Use the OpenAI endpoint to call the model
121
123
122
-
From your host computer you can access the model using the OpenAI endpoint and a TCP port.
124
+
Docker Model Runner exposes a REST endpoint compatible with OpenAI's API spec.
125
+
126
+
From your host computer, you can access the model using the OpenAI endpoint and a TCP port.
123
127
124
128
First, enable the TCP port to connect with the model:
125
129
@@ -155,7 +159,7 @@ Run the shell script:
155
159
bash ./curl-test.sh | jq
156
160
```
157
161
158
-
If you don't have `jq` installed, you eliminate piping the output.
162
+
If you don't have `jq` installed, you can eliminate piping the output.
159
163
160
164
The output, including the performance information, is shown below:
161
165
@@ -193,5 +197,14 @@ The output, including the performance information, is shown below:
193
197
}
194
198
}
195
199
```
200
+
You now have a fully functioning OpenAI-compatible inference endpoint running locally.
201
+
202
+
## What you've learned
203
+
204
+
In this section, you learned:
205
+
206
+
* How to verify and use Docker Model Runner on Docker Desktop
207
+
* How to run a model interactively from the CLI
208
+
* How to connect to a model using a local OpenAI-compatible API
196
209
197
-
In this section you learned how to run AI models using Docker Model Runner. Continue to see how to use Docker Compose to build an application with a built-in AI model.
210
+
In the next section, you'll use Docker Compose to deploy a web-based AI chat interface powered by Docker Model Runner.
0 commit comments