Merge pull request #1998 from jasonrandrews/review2

jasonrandrews · web-flow · commit 8757036c4abd · 2025-05-30T15:45:36.000-05:00
Docker Model Runner Learning Path
diff --git a/content/learning-paths/laptops-and-desktops/docker-models/_index.md b/content/learning-paths/laptops-and-desktops/docker-models/_index.md
@@ -0,0 +1,53 @@
+---
+title: Learn how to use Docker Model Runner in AI applications
+
+draft: true
+cascade:
+    draft: true
+
+minutes_to_complete: 45
+
+who_is_this_for: This is for software developers and AI enthusiasts who want to run AI models using Docker Model Runner.
+
+learning_objectives:
+    - Run AI models locally using Docker Model Runner.
+    - Easily build containerized applications with LLMs.
+
+prerequisites:
+    - A computer with at least 16GB of RAM (recommended) and Docker Desktop installed (version 4.40 or later).
+    - Basic understanding of Docker.
+    - Familiarity with Large Language Model (LLM) concepts.
+
+author: Jason Andrews
+
+### Tags
+skilllevels: Introductory
+subjects: Containers and Virtualization
+armips:
+    - Neoverse
+    - Cortex-A
+operatingsystems:
+    - Windows
+    - macOS
+tools_software_languages:
+    - Docker
+    - Python
+    - LLM
+
+further_reading:
+    - resource:
+        title: Docker Model Runner Documentation
+        link: https://docs.docker.com/model-runner/
+        type: documentation
+    - resource:
+        title: Introducing Docker Model Runner
+        link: https://www.docker.com/blog/introducing-docker-model-runner/
+        type: blog
+
+### FIXED, DO NOT MODIFY
+# ================================================================================
+weight: 1                       # _index.md always has weight of 1 to order correctly
+layout: "learningpathall"       # All files under learning paths have this same wrapper
+learning_path_main_page: "yes"  # This should be surfaced when looking for related content. Only set for _index.md of learning path content.
+---
+
diff --git a/content/learning-paths/laptops-and-desktops/docker-models/_next-steps.md b/content/learning-paths/laptops-and-desktops/docker-models/_next-steps.md
@@ -0,0 +1,8 @@
+---
+# ================================================================================
+#       FIXED, DO NOT MODIFY THIS FILE
+# ================================================================================
+weight: 21                  # Set to always be larger than the content in this path to be at the end of the navigation.
+title: "Next Steps"         # Always the same, html page title.
+layout: "learningpathall"   # All files under learning paths have this same wrapper for Hugo processing.
+---
diff --git a/content/learning-paths/laptops-and-desktops/docker-models/compose-app.png b/content/learning-paths/laptops-and-desktops/docker-models/compose-app.png
diff --git a/content/learning-paths/laptops-and-desktops/docker-models/compose.md b/content/learning-paths/laptops-and-desktops/docker-models/compose.md
@@ -0,0 +1,123 @@
+---
+title: "Run a containerized AI chat app with Docker Compose"
+weight: 3
+layout: "learningpathall"
+---
+
+Docker Compose makes it easy to run multi-container applications. Docker Compose can also include AI models in your project.
+
+In this section, you'll learn how to use Docker Compose to deploy a web-based AI chat application that uses Docker Model Runner as the backend for AI inference.
+
+## Clone the example project
+
+The example project, named [docker-model-runner-chat](https://github.com/jasonrandrews/docker-model-runner-chat) is available on GitHub. It provides a simple web interface to interact with local AI models such as Llama 3.2 or Gemma 3.
+
+First, clone the example repository:
+
+```console
+git clone https://github.com/jasonrandrews/docker-model-runner-chat.git
+cd docker-model-runner-chat
+```
+
+## Review the Docker Compose file
+
+The `compose.yaml` file defines how the application is deployed using Docker Compose. 
+
+It sets up two services:
+
+- **ai-chat**: A Flask-based web application that provides the chat user interface. It is built from the local directory, exposes port 5000 for browser access, mounts the project directory as a volume for live code updates, loads environment variables from `vars.env`, and waits for the `ai-runner` service to be ready before starting.
+- **ai-runner**: This service uses the Docker Model Runner provider to run the selected AI model (for example, `ai/gemma3`). The configuration under `provider` tells Docker to use the model runner extension and specifies which model to load.
+
+The setup allows the web app to communicate with the model runner service as if it were an OpenAI-compatible API, making it easy to swap models or update endpoints by changing environment variables or compose options.
+
+Review the `compose.yaml` file to see the two services.
+
+```yaml
+services:
+  ai-chat:
+    build:
+      context: .
+    ports:
+      - "5000:5000"
+    volumes:
+      - ./:/app
+    env_file:
+      - vars.env
+    depends_on:
+      - ai-runner
+  ai-runner:
+    provider:
+      type: model
+      options:
+        model: ai/gemma3
+```
+
+## Start the application
+
+From the project directory, start the app with:
+
+```console
+docker compose up --build
+```
+
+Docker Compose will build the web app image and start both services.
+
+## Access the chat interface
+
+Open your browser and copy and paste the local URL below: 
+
+```console
+http://localhost:5000
+```
+
+You can now chat with the AI model using the web interface. Enter your prompt and view the response in real time.
+
+![Compose #center](compose-app.png)
+
+## Configuration
+
+You can change the AI model or endpoint by editing the `vars.env` file before starting the containers. The file contains environment variables used by the web application:
+
+- `BASE_URL`: The base URL for the AI model API. By default, it is set to `http://model-runner.docker.internal/engines/v1/`, which allows the web app to communicate with the Docker Model Runner service. This is the default endpoint setup by Docker to access the model. 
+- `MODEL`: The AI model to use (for example, `ai/gemma3` or `ai/llama3.2`).
+
+The `vars.env` file is shown below. 
+
+```console
+BASE_URL=http://model-runner.docker.internal/engines/v1/
+MODEL=ai/gemma3
+```
+
+To use a different model, change the `MODEL` value. For example:
+
+```console
+MODEL=ai/llama3.2
+```
+
+Make sure to change the model in the `compose.yaml` file also. 
+
+You can also change the `temperature` and `max_tokens` values in `app.py` to further customize the application.
+
+## Stop the application
+
+To stop the services, press `Ctrl+C` in the terminal.
+
+You can also run the command below in another terminal to stop the services.
+
+```console
+docker compose down
+```
+
+## Troubleshooting
+
+Use the steps below if you have any issues running the application:
+
+- Ensure Docker and Docker Compose are installed and running
+- Make sure port 5000 is not in use by another application
+- Check logs with:
+
+```console
+docker compose logs
+```
+
+In this section, you learned how to use Docker Compose to run a containerized AI chat application with a web interface and local model inference from Docker Model Runner. 
diff --git a/content/learning-paths/laptops-and-desktops/docker-models/models-tab.png b/content/learning-paths/laptops-and-desktops/docker-models/models-tab.png
diff --git a/content/learning-paths/laptops-and-desktops/docker-models/models.md b/content/learning-paths/laptops-and-desktops/docker-models/models.md
@@ -0,0 +1,197 @@
+---
+title: "Run AI models using Docker Model Runner"
+weight: 2
+layout: "learningpathall"
+---
+
+Docker Model Runner is an official Docker extension that allows you to run Large Language Models (LLMs) on your local computer. It provides a convenient way to deploy and use AI models across different environments, including Arm-based systems, without complex setup or cloud dependencies.
+
+Docker uses [llama.cpp](https://github.com/ggml-org/llama.cpp), an open source C/C++ project developed by Georgi Gerganov that enables efficient LLM inference on a variety of hardware, but you do not need to download, build, or install any LLM frameworks. 
+
+Docker Model Runner provides a easy to use CLI that is familiar to Docker users. 
+
+## Before you begin
+
+Verify Docker is running with:
+
+```console
+docker version
+```
+
+You should see output showing your Docker version. 
+
+Confirm the Docker Desktop version is 4.40 or above, for example:
+
+```output
+Server: Docker Desktop 4.41.2 (191736)
+```
+
+Make sure the Docker Model Runner is enabled.
+
+```console
+docker model --help
+```
+
+You should see the usage message:
+
+```output
+Usage:  docker model COMMAND
+
+Docker Model Runner
+
+Commands:
+  inspect     Display detailed information on one model
+  list        List the available models that can be run with the Docker Model Runner
+  logs        Fetch the Docker Model Runner logs
+  pull        Download a model
+  push        Upload a model
+  rm          Remove models downloaded from Docker Hub
+  run         Run a model with the Docker Model Runner
+  status      Check if the Docker Model Runner is running
+  tag         Tag a model
+  version     Show the Docker Model Runner version
+```
+
+If Docker Model Runner is not enabled, enable it using the [Docker Model Runner documentation](https://docs.docker.com/model-runner/).
+
+You should also see the Models icon in your Docker Desktop sidebar.
+ 
+![Models #center](models-tab.png)
+
+## Running your first AI model with Docker Model Runner
+
+Docker Model Runner is an extension for Docker Desktop that simplifies running AI models locally. 
+
+Docker Model Runner automatically selects compatible model versions and optimizes performance for the Arm architecture.
+
+You can try Docker Model Runner by using an LLM from Docker Hub. 
+
+The example below uses the [SmolLM2 model](https://hub.docker.com/r/ai/smollm2), a compact language model with 360 million parameters, designed to run efficiently on-device while performing a wide range of language tasks. You can explore additional [models in Docker Hub](https://hub.docker.com/u/ai).
+
+Download the model using:
+
+```console
+docker model pull ai/smollm2
+```
+
+For a simple chat interface, run the model:
+
+```console
+docker model run ai/smollm2
+```
+
+Enter a prompt at the CLI:
+
+```console
+write a simple hello world program in C++
+```
+
+You see the output from the SmolLM2 model:
+
+```output
+#include <iostream>
+
+int main() {
+    std::cout << "Hello, World!" << std::endl;
+    return 0;
+}
+```
+
+You can ask more questions and continue to chat.
+
+To exit the chat use the `/bye` command.
+
+You can print the list of models on your computer using:
+
+```console
+docker model list
+```
+
+Your list will be different based on the models you have downloaded. 
+
+```output
+MODEL NAME   PARAMETERS  QUANTIZATION    ARCHITECTURE  MODEL ID      CREATED       SIZE
+ai/gemma3    3.88 B      IQ2_XXS/Q4_K_M  gemma3        0b329b335467  2 months ago  2.31 GiB
+ai/phi4      14.66 B     IQ2_XXS/Q4_K_M  phi3          03c0bc8e0f5a  2 months ago  8.43 GiB
+ai/smollm2   361.82 M    IQ2_XXS/Q4_K_M  llama         354bf30d0aa3  2 months ago  256.35 MiB
+ai/llama3.2  3.21 B      IQ2_XXS/Q4_K_M  llama         436bb282b419  2 months ago  1.87 GiB
+```
+
+## Use the OpenAI endpoint to call the model
+
+From your host computer you can access the model using the OpenAI endpoint and a TCP port. 
+
+First, enable the TCP port to connect with the model:
+
+```console
+docker desktop enable model-runner --tcp 12434
+```
+
+Next, use a text editor to save the code below in a file named `curl-test.sh`:
+
+```bash
+#!/bin/sh
+
+curl http://localhost:12434/engines/llama.cpp/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "ai/smollm2",
+    "messages": [
+      {
+        "role": "system",
+        "content": "You are a helpful assistant."
+      },
+      {
+        "role": "user",
+        "content": "Please write a hello world program in Java."
+      }
+    ]
+  }'
+```
+
+Run the shell script:
+
+```console
+bash ./curl-test.sh | jq
+```
+
+If you don't have `jq` installed, you eliminate piping the output.
+
+The output, including the performance information, is shown below:
+
+```output
+{
+  "choices": [
+    {
+      "finish_reason": "stop",
+      "index": 0,
+      "message": {
+        "role": "assistant",
+        "content": "Here's a simple \"Hello World\" program in Java:\n\n```java\npublic class HelloWorld {\n    public static void main(String[] args) {\n        System.out.println(\"Hello, World!\");\n    }\n}\n```\n\nThis program declares a `HelloWorld` class, defines a `main` method that contains the program's execution, and then uses `System.out.println` to print \"Hello, World!\" to the console."
+      }
+    }
+  ],
+  "created": 1748622685,
+  "model": "ai/smollm2",
+  "system_fingerprint": "b1-a0f7016",
+  "object": "chat.completion",
+  "usage": {
+    "completion_tokens": 101,
+    "prompt_tokens": 28,
+    "total_tokens": 129
+  },
+  "id": "chatcmpl-uZGBuFoS2ERodT4KilStxDwhySLQBTN9",
+  "timings": {
+    "prompt_n": 28,
+    "prompt_ms": 32.349,
+    "prompt_per_token_ms": 1.1553214285714284,
+    "prompt_per_second": 865.5599863983431,
+    "predicted_n": 101,
+    "predicted_ms": 469.524,
+    "predicted_per_token_ms": 4.648752475247525,
+    "predicted_per_second": 215.11147459980745
+  }
+}
+```
+
+In this section you learned how to run AI models using Docker Model Runner. Continue to see how to use Docker Compose to build an application with a built-in AI model.