diff --git a/content/manuals/ai/model-runner/_index.md b/content/manuals/ai/model-runner/_index.md index 3cbda8460582..2d84b62d4678 100644 --- a/content/manuals/ai/model-runner/_index.md +++ b/content/manuals/ai/model-runner/_index.md @@ -16,7 +16,7 @@ aliases: {{< summary-bar feature_name="Docker Model Runner" >}} -Docker Model Runner makes it easy to manage, run, and +Docker Model Runner (DMR) makes it easy to manage, run, and deploy AI models using Docker. Designed for developers, Docker Model Runner streamlines the process of pulling, running, and serving large language models (LLMs) and other AI models directly from Docker Hub or any @@ -39,7 +39,7 @@ with AI models locally. - Package GGUF files as OCI Artifacts and publish them to any Container Registry - Run and interact with AI models directly from the command line or from the Docker Desktop GUI - Manage local models and display logs -- Display prompts and responses details +- Display prompt and response details ## Requirements @@ -75,14 +75,14 @@ Docker Engine only: {{< /tab >}} {{}} -## How it works +## How Docker Model Runner works Models are pulled from Docker Hub the first time you use them and are stored locally. They load into memory only at runtime when a request is made, and unload when not in use to optimize resources. Because models can be large, the initial pull may take some time. After that, they're cached locally for faster access. You can interact with the model using -[OpenAI-compatible APIs](#what-api-endpoints-are-available). +[OpenAI-compatible APIs](api-reference.md). > [!TIP] > @@ -92,569 +92,6 @@ access. You can interact with the model using > [Docker Compose](/manuals/ai/compose/models-and-compose.md) now support Docker > Model Runner. -## Enable Docker Model Runner - -### Enable DMR in Docker Desktop - -1. In the settings view, go to the **Beta features** tab. -1. Select the **Enable Docker Model Runner** setting. -1. If you use Windows with a supported NVIDIA GPU, you also see and can select - **Enable GPU-backed inference**. -1. Optional: To enable TCP support, select **Enable host-side TCP support**. - 1. In the **Port** field, type the port you want to use. - 1. If you interact with Model Runner from a local frontend web app, in - **CORS Allows Origins**, select the origins that Model Runner should - accept requests from. An origin is the URL where your web app runs, for - example `http://localhost:3131`. - -You can now use the `docker model` command in the CLI and view and interact -with your local models in the **Models** tab in the Docker Desktop Dashboard. - -> [!IMPORTANT] -> -> For Docker Desktop versions 4.41 and earlier, this setting was under the -> **Experimental features** tab on the **Features in development** page. - -### Enable DMR in Docker Engine - -1. Ensure you have installed [Docker Engine](/engine/install/). -1. DMR is available as a package. To install it, run: - - {{< tabs >}} - {{< tab name="Ubuntu/Debian">}} - - ```console - $ sudo apt-get update - $ sudo apt-get install docker-model-plugin - ``` - - {{< /tab >}} - {{< tab name="RPM-base distributions">}} - - ```console - $ sudo dnf update - $ sudo dnf install docker-model-plugin - ``` - - {{< /tab >}} - {{< /tabs >}} - -1. Test the installation: - - ```console - $ docker model version - $ docker model run ai/smollm2 - ``` - -> [!NOTE] -> TCP support is enabled by default for Docker Engine on port `12434`. - -### Update DMR in Docker Engine - -To update Docker Model Runner in Docker Engine, uninstall it with -[`docker model uninstall-runner`](/reference/cli/docker/model/uninstall-runner/) -then reinstall it: - -```console -docker model uninstall-runner --images && docker model install-runner -``` - -> [!NOTE] -> With the above command, local models are preserved. -> To delete the models during the upgrade, add the `--models` option to the -> `uninstall-runner` command. - -## Pull a model - -Models are cached locally. - -> [!NOTE] -> -> When you use the Docker CLI, you can also pull models directly from -> [HuggingFace](https://huggingface.co/). - -{{< tabs group="release" >}} -{{< tab name="From Docker Desktop">}} - -1. Select **Models** and select the **Docker Hub** tab. -1. Find the model you want and select **Pull**. - -![Screenshot showing the Docker Hub view.](./images/dmr-catalog.png) - -{{< /tab >}} -{{< tab name="From the Docker CLI">}} - -Use the [`docker model pull` command](/reference/cli/docker/model/pull/). -For example: - -```bash {title="Pulling from Docker Hub"} -docker model pull ai/smollm2:360M-Q4_K_M -``` - -```bash {title="Pulling from HuggingFace"} -docker model pull hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF -``` - -{{< /tab >}} -{{< /tabs >}} - -## Run a model - -{{< tabs group="release" >}} -{{< tab name="From Docker Desktop">}} - -1. Select **Models** and select the **Local** tab. -1. Select the play button. The interactive chat screen opens. - -![Screenshot showing the Local view.](./images/dmr-run.png) - -{{< /tab >}} -{{< tab name="From the Docker CLI" >}} - -Use the [`docker model run` command](/reference/cli/docker/model/run/). - -{{< /tab >}} -{{< /tabs >}} - -## Troubleshooting - -### Display the logs - -To troubleshoot issues, display the logs: - -{{< tabs group="release" >}} -{{< tab name="From Docker Desktop">}} - -Select **Models** and select the **Logs** tab. - -![Screenshot showing the Models view.](./images/dmr-logs.png) - -{{< /tab >}} -{{< tab name="From the Docker CLI">}} - -Use the [`docker model logs` command](/reference/cli/docker/model/logs/). - -{{< /tab >}} -{{< /tabs >}} - -### Inspect requests and responses - -Inspecting requests and responses helps you diagnose model-related issues. -For example, you can evaluate context usage to verify you stay within the model's context -window or display the full body of a request to control the parameters you are passing to your models -when developing with a framework. - -In Docker Desktop, to inspect the requests and responses for each model: - -1. Select **Models** and select the **Requests** tab. This view displays all the requests to all models: - - The time the request was sent. - - The model name and version - - The prompt/request - - The context usage - - The time it took for the response to be generated. -2. Select one of the requests to display further details: - - In the **Overview** tab, view the token usage, response metadata and generation speed, and the actual prompt and response. - - In the **Request** and **Response** tabs, view the full JSON payload of the request and the response. - -> [!NOTE] -> You can also display the requests for a specific model when you select a model and then select the **Requests** tab. - -## Publish a model - -> [!NOTE] -> -> This works for any Container Registry supporting OCI Artifacts, not only -> Docker Hub. - -You can tag existing models with a new name and publish them under a different -namespace and repository: - -```console -# Tag a pulled model under a new name -$ docker model tag ai/smollm2 myorg/smollm2 - -# Push it to Docker Hub -$ docker model push myorg/smollm2 -``` - -For more details, see the [`docker model tag`](/reference/cli/docker/model/tag) -and [`docker model push`](/reference/cli/docker/model/push) command -documentation. - -You can also package a model file in GGUF format as an OCI Artifact and publish -it to Docker Hub. - -```console -# Download a model file in GGUF format, for example from HuggingFace -$ curl -L -o model.gguf https://huggingface.co/TheBloke/Mistral-7B-v0.1-GGUF/resolve/main/mistral-7b-v0.1.Q4_K_M.gguf - -# Package it as OCI Artifact and push it to Docker Hub -$ docker model package --gguf "$(pwd)/model.gguf" --push myorg/mistral-7b-v0.1:Q4_K_M -``` - -For more details, see the -[`docker model package`](/reference/cli/docker/model/package/) command -documentation. - -## Example: Integrate Docker Model Runner into your software development lifecycle - -### Sample project - -You can now start building your generative AI application powered by Docker -Model Runner. - -If you want to try an existing GenAI application, follow these steps: - -1. Set up the sample app. Clone and run the following repository: - - ```console - $ git clone https://github.com/docker/hello-genai.git - ``` - -1. In your terminal, go to the `hello-genai` directory. - -1. Run `run.sh` to pull the chosen model and run the app. - -1. Open your app in the browser at the addresses specified in the repository - [README](https://github.com/docker/hello-genai). - -You see the GenAI app's interface where you can start typing your prompts. - -You can now interact with your own GenAI app, powered by a local model. Try a -few prompts and notice how fast the responses are — all running on your machine -with Docker. - -### Use Model Runner in GitHub Actions - -Here is an example of how to use Model Runner as part of a GitHub workflow. -The example installs Model Runner, tests the installation, pulls and runs a -model, interacts with the model via the API, and deletes the model. - -```yaml {title="dmr-run.yml", collapse=true} -name: Docker Model Runner Example Workflow - -permissions: - contents: read - -on: - workflow_dispatch: - inputs: - test_model: - description: 'Model to test with (default: ai/smollm2:360M-Q4_K_M)' - required: false - type: string - default: 'ai/smollm2:360M-Q4_K_M' - -jobs: - dmr-test: - runs-on: ubuntu-latest - timeout-minutes: 30 - - steps: - - name: Set up Docker - uses: docker/setup-docker-action@v4 - - - name: Install docker-model-plugin - run: | - echo "Installing docker-model-plugin..." - # Add Docker's official GPG key: - sudo apt-get update - sudo apt-get install ca-certificates curl - sudo install -m 0755 -d /etc/apt/keyrings - sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc - sudo chmod a+r /etc/apt/keyrings/docker.asc - - # Add the repository to Apt sources: - echo \ - "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \ - $(. /etc/os-release && echo "${UBUNTU_CODENAME:-$VERSION_CODENAME}") stable" | \ - sudo tee /etc/apt/sources.list.d/docker.list > /dev/null - sudo apt-get update - sudo apt-get install -y docker-model-plugin - - echo "Installation completed successfully" - - - name: Test docker model version - run: | - echo "Testing docker model version command..." - sudo docker model version - - # Verify the command returns successfully - if [ $? -eq 0 ]; then - echo "✅ docker model version command works correctly" - else - echo "❌ docker model version command failed" - exit 1 - fi - - - name: Pull the provided model and run it - run: | - MODEL="${{ github.event.inputs.test_model || 'ai/smollm2:360M-Q4_K_M' }}" - echo "Testing with model: $MODEL" - - # Test model pull - echo "Pulling model..." - sudo docker model pull "$MODEL" - - if [ $? -eq 0 ]; then - echo "✅ Model pull successful" - else - echo "❌ Model pull failed" - exit 1 - fi - - # Test basic model run (with timeout to avoid hanging) - echo "Testing docker model run..." - timeout 60s sudo docker model run "$MODEL" "Give me a fact about whales." || { - exit_code=$? - if [ $exit_code -eq 124 ]; then - echo "✅ Model run test completed (timed out as expected for non-interactive test)" - else - echo "❌ Model run failed with exit code: $exit_code" - exit 1 - fi - } - - name: Test model pull and run - run: | - MODEL="${{ github.event.inputs.test_model || 'ai/smollm2:360M-Q4_K_M' }}" - echo "Testing with model: $MODEL" - - # Test model pull - echo "Pulling model..." - sudo docker model pull "$MODEL" - - if [ $? -eq 0 ]; then - echo "✅ Model pull successful" - else - echo "❌ Model pull failed" - exit 1 - fi - - # Test basic model run (with timeout to avoid hanging) - echo "Testing docker model run..." - timeout 60s sudo docker model run "$MODEL" "Give me a fact about whales." || { - exit_code=$? - if [ $exit_code -eq 124 ]; then - echo "✅ Model run test completed (timed out as expected for non-interactive test)" - else - echo "❌ Model run failed with exit code: $exit_code" - exit 1 - fi - } - - - name: Test API endpoint - run: | - MODEL="${{ github.event.inputs.test_model || 'ai/smollm2:360M-Q4_K_M' }}" - echo "Testing API endpoint with model: $MODEL" - - # Test API call with curl - echo "Testing API call..." - RESPONSE=$(curl -s http://localhost:12434/engines/llama.cpp/v1/chat/completions \ - -H "Content-Type: application/json" \ - -d "{ - \"model\": \"$MODEL\", - \"messages\": [ - { - \"role\": \"user\", - \"content\": \"Say hello\" - } - ], - \"top_k\": 1, - \"temperature\": 0 - }") - - if [ $? -eq 0 ]; then - echo "✅ API call successful" - echo "Response received: $RESPONSE" - - # Check if response contains "hello" (case-insensitive) - if echo "$RESPONSE" | grep -qi "hello"; then - echo "✅ Response contains 'hello' (case-insensitive)" - else - echo "❌ Response does not contain 'hello'" - echo "Full response: $RESPONSE" - exit 1 - fi - else - echo "❌ API call failed" - exit 1 - fi - - - name: Test model cleanup - run: | - MODEL="${{ github.event.inputs.test_model || 'ai/smollm2:360M-Q4_K_M' }}" - - echo "Cleaning up test model..." - sudo docker model rm "$MODEL" || echo "Model removal failed or model not found" - - # Verify model was removed - echo "Verifying model cleanup..." - sudo docker model ls - - echo "✅ Model cleanup completed" - - - name: Report success - if: success() - run: | - echo "🎉 Docker Model Runner daily health check completed successfully!" - echo "All tests passed:" - echo " ✅ docker-model-plugin installation successful" - echo " ✅ docker model version command working" - echo " ✅ Model pull and run operations successful" - echo " ✅ API endpoint operations successful" - echo " ✅ Cleanup operations successful" -``` - -## FAQs - -### What models are available? - -All the available models are hosted in the [public Docker Hub namespace of `ai`](https://hub.docker.com/u/ai). - -### What CLI commands are available? - -See [the reference docs](/reference/cli/docker/model/). - -### What API endpoints are available? - -Once the feature is enabled, new API endpoints are available under the following base URLs: - -{{< tabs >}} -{{< tab name="Docker Desktop">}} - -- From containers: `http://model-runner.docker.internal/` -- From host processes: `http://localhost:12434/`, assuming TCP host access is - enabled on the default port (12434). - -{{< /tab >}} -{{< tab name="Docker Engine">}} - -- From containers: `http://172.17.0.1:12434/` (with `172.17.0.1` representing the host gateway address) -- From host processes: `http://localhost:12434/` - -> [!NOTE] -> The `172.17.0.1` interface may not be available by default to containers - within a Compose project. -> In this case, add an `extra_hosts` directive to your Compose service YAML: -> -> ```yaml -> extra_hosts: -> - "model-runner.docker.internal:host-gateway" -> ``` -> Then you can access the Docker Model Runner APIs at http://model-runner.docker.internal:12434/ - -{{< /tab >}} -{{}} - -Docker Model management endpoints: - -```text -POST /models/create -GET /models -GET /models/{namespace}/{name} -DELETE /models/{namespace}/{name} -``` - -OpenAI endpoints: - -```text -GET /engines/llama.cpp/v1/models -GET /engines/llama.cpp/v1/models/{namespace}/{name} -POST /engines/llama.cpp/v1/chat/completions -POST /engines/llama.cpp/v1/completions -POST /engines/llama.cpp/v1/embeddings -``` - -To call these endpoints via a Unix socket (`/var/run/docker.sock`), prefix their path -with `/exp/vDD4.40`. - -> [!NOTE] -> You can omit `llama.cpp` from the path. For example: `POST /engines/v1/chat/completions`. - -### How do I interact through the OpenAI API? - -#### From within a container - -To call the `chat/completions` OpenAI endpoint from within another container using `curl`: - -```bash -#!/bin/sh - -curl http://model-runner.docker.internal/engines/llama.cpp/v1/chat/completions \ - -H "Content-Type: application/json" \ - -d '{ - "model": "ai/smollm2", - "messages": [ - { - "role": "system", - "content": "You are a helpful assistant." - }, - { - "role": "user", - "content": "Please write 500 words about the fall of Rome." - } - ] - }' - -``` - -#### From the host using TCP - -To call the `chat/completions` OpenAI endpoint from the host via TCP: - -1. Enable the host-side TCP support from the Docker Desktop GUI, or via the [Docker Desktop CLI](/manuals/desktop/features/desktop-cli.md). - For example: `docker desktop enable model-runner --tcp `. - - If you are running on Windows, also enable GPU-backed inference. - See [Enable Docker Model Runner](#enable-dmr-in-docker-desktop). - -2. Interact with it as documented in the previous section using `localhost` and the correct port. - -```bash -#!/bin/sh - - curl http://localhost:12434/engines/llama.cpp/v1/chat/completions \ - -H "Content-Type: application/json" \ - -d '{ - "model": "ai/smollm2", - "messages": [ - { - "role": "system", - "content": "You are a helpful assistant." - }, - { - "role": "user", - "content": "Please write 500 words about the fall of Rome." - } - ] - }' -``` - -#### From the host using a Unix socket - -To call the `chat/completions` OpenAI endpoint through the Docker socket from the host using `curl`: - -```bash -#!/bin/sh - -curl --unix-socket $HOME/.docker/run/docker.sock \ - localhost/exp/vDD4.40/engines/llama.cpp/v1/chat/completions \ - -H "Content-Type: application/json" \ - -d '{ - "model": "ai/smollm2", - "messages": [ - { - "role": "system", - "content": "You are a helpful assistant." - }, - { - "role": "user", - "content": "Please write 500 words about the fall of Rome." - } - ] - }' -``` - ## Known issues ### `docker model` is not recognised @@ -681,4 +118,9 @@ The Docker Model CLI currently lacks consistent support for specifying models by ## Share feedback -Thanks for trying out Docker Model Runner. Give feedback or report any bugs you may find through the **Give feedback** link next to the **Enable Docker Model Runner** setting. +Thanks for trying out Docker Model Runner. Give feedback or report any bugs +you may find through the **Give feedback** link next to the **Enable Docker Model Runner** setting. + +## Next steps + +[Get started with DMR](get-started.md) diff --git a/content/manuals/ai/model-runner/api-reference.md b/content/manuals/ai/model-runner/api-reference.md new file mode 100644 index 000000000000..6a05c7c82893 --- /dev/null +++ b/content/manuals/ai/model-runner/api-reference.md @@ -0,0 +1,192 @@ +--- +title: DMR REST API +description: Reference documentation for the Docker Model Runner REST API endpoints and usage examples. +weight: 30 +keywords: Docker, ai, model runner, rest api, openai, endpoints, documentation +--- + +Once Model Runner is enabled, new API endpoints are available. You can use +these endpoints to interact with a model programmatically. + +### Determine the base URL + +The base URL to interact with the endpoints depends +on how you run Docker: + +{{< tabs >}} +{{< tab name="Docker Desktop">}} + +- From containers: `http://model-runner.docker.internal/` +- From host processes: `http://localhost:12434/`, assuming TCP host access is + enabled on the default port (12434). + +{{< /tab >}} +{{< tab name="Docker Engine">}} + +- From containers: `http://172.17.0.1:12434/` (with `172.17.0.1` representing the host gateway address) +- From host processes: `http://localhost:12434/` + +> [!NOTE] +> The `172.17.0.1` interface may not be available by default to containers + within a Compose project. +> In this case, add an `extra_hosts` directive to your Compose service YAML: +> +> ```yaml +> extra_hosts: +> - "model-runner.docker.internal:host-gateway" +> ``` +> Then you can access the Docker Model Runner APIs at http://model-runner.docker.internal:12434/ + +{{< /tab >}} +{{}} + +### Available DMR endpoints + +- Create a model: + + ```text + POST /models/create + ``` + +- List models: + + ```text + GET /models + ``` + +- Get a model: + + ```text + GET /models/{namespace}/{name} + ``` + +- Delete a local model: + + ```text + DELETE /models/{namespace}/{name} + ``` + +### Available OpenAPI endpoints + +DMR supports the following OpenAPI endpoints: + +- [List models](https://platform.openai.com/docs/api-reference/models/list): + + ```text + GET /engines/llama.cpp/v1/models + ``` + +- [Retrieve model](https://platform.openai.com/docs/api-reference/models/retrieve): + + ```text + GET /engines/llama.cpp/v1/models/{namespace}/{name} + ``` + +- [List chat completions](https://platform.openai.com/docs/api-reference/chat/list): + + ```text + POST /engines/llama.cpp/v1/chat/completions + ``` + +- [Create completions](https://platform.openai.com/docs/api-reference/completions/create): + + ```text + POST /engines/llama.cpp/v1/completions + ``` + + +- [Create embeddings](https://platform.openai.com/docs/api-reference/embeddings/create): + + ```text + POST /engines/llama.cpp/v1/embeddings + ``` + +To call these endpoints via a Unix socket (`/var/run/docker.sock`), prefix their path +with `/exp/vDD4.40`. + +> [!NOTE] +> You can omit `llama.cpp` from the path. For example: `POST /engines/v1/chat/completions`. + +## REST API examples + +### Request from within a container + +To call the `chat/completions` OpenAI endpoint from within another container using `curl`: + +```bash +#!/bin/sh + +curl http://model-runner.docker.internal/engines/llama.cpp/v1/chat/completions \ + -H "Content-Type: application/json" \ + -d '{ + "model": "ai/smollm2", + "messages": [ + { + "role": "system", + "content": "You are a helpful assistant." + }, + { + "role": "user", + "content": "Please write 500 words about the fall of Rome." + } + ] + }' + +``` + +### Request from the host using TCP + +To call the `chat/completions` OpenAI endpoint from the host via TCP: + +1. Enable the host-side TCP support from the Docker Desktop GUI, or via the [Docker Desktop CLI](/manuals/desktop/features/desktop-cli.md). + For example: `docker desktop enable model-runner --tcp `. + + If you are running on Windows, also enable GPU-backed inference. + See [Enable Docker Model Runner](get-started.md#enable-docker-model-runner-in-docker-desktop). + +2. Interact with it as documented in the previous section using `localhost` and the correct port. + +```bash +#!/bin/sh + + curl http://localhost:12434/engines/llama.cpp/v1/chat/completions \ + -H "Content-Type: application/json" \ + -d '{ + "model": "ai/smollm2", + "messages": [ + { + "role": "system", + "content": "You are a helpful assistant." + }, + { + "role": "user", + "content": "Please write 500 words about the fall of Rome." + } + ] + }' +``` + +### Request from the host using a Unix socket + +To call the `chat/completions` OpenAI endpoint through the Docker socket from the host using `curl`: + +```bash +#!/bin/sh + +curl --unix-socket $HOME/.docker/run/docker.sock \ + localhost/exp/vDD4.40/engines/llama.cpp/v1/chat/completions \ + -H "Content-Type: application/json" \ + -d '{ + "model": "ai/smollm2", + "messages": [ + { + "role": "system", + "content": "You are a helpful assistant." + }, + { + "role": "user", + "content": "Please write 500 words about the fall of Rome." + } + ] + }' +``` \ No newline at end of file diff --git a/content/manuals/ai/model-runner/examples.md b/content/manuals/ai/model-runner/examples.md new file mode 100644 index 000000000000..b20b9a58ca1a --- /dev/null +++ b/content/manuals/ai/model-runner/examples.md @@ -0,0 +1,219 @@ +--- +title: DMR examples +description: Example projects and CI/CD workflows for Docker Model Runner. +weight: 40 +keywords: Docker, ai, model runner, examples, github actions, genai, sample project +--- + +See some examples of complete workflows using Docker Model Runner. + +## Sample project + +You can now start building your generative AI application powered by Docker +Model Runner. + +If you want to try an existing GenAI application, follow these steps: + +1. Set up the sample app. Clone and run the following repository: + + ```console + $ git clone https://github.com/docker/hello-genai.git + ``` + +1. In your terminal, go to the `hello-genai` directory. + +1. Run `run.sh` to pull the chosen model and run the app. + +1. Open your app in the browser at the addresses specified in the repository + [README](https://github.com/docker/hello-genai). + +You see the GenAI app's interface where you can start typing your prompts. + +You can now interact with your own GenAI app, powered by a local model. Try a +few prompts and notice how fast the responses are — all running on your machine +with Docker. + +## Use Model Runner in GitHub Actions + +Here is an example of how to use Model Runner as part of a GitHub workflow. +The example installs Model Runner, tests the installation, pulls and runs a +model, interacts with the model via the API, and deletes the model. + +```yaml {title="dmr-run.yml", collapse=true} +name: Docker Model Runner Example Workflow + +permissions: + contents: read + +on: + workflow_dispatch: + inputs: + test_model: + description: 'Model to test with (default: ai/smollm2:360M-Q4_K_M)' + required: false + type: string + default: 'ai/smollm2:360M-Q4_K_M' + +jobs: + dmr-test: + runs-on: ubuntu-latest + timeout-minutes: 30 + + steps: + - name: Set up Docker + uses: docker/setup-docker-action@v4 + + - name: Install docker-model-plugin + run: | + echo "Installing docker-model-plugin..." + # Add Docker's official GPG key: + sudo apt-get update + sudo apt-get install ca-certificates curl + sudo install -m 0755 -d /etc/apt/keyrings + sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc + sudo chmod a+r /etc/apt/keyrings/docker.asc + + # Add the repository to Apt sources: + echo \ + "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \ + $(. /etc/os-release && echo "${UBUNTU_CODENAME:-$VERSION_CODENAME}") stable" | \ + sudo tee /etc/apt/sources.list.d/docker.list > /dev/null + sudo apt-get update + sudo apt-get install -y docker-model-plugin + + echo "Installation completed successfully" + + - name: Test docker model version + run: | + echo "Testing docker model version command..." + sudo docker model version + + # Verify the command returns successfully + if [ $? -eq 0 ]; then + echo "✅ docker model version command works correctly" + else + echo "❌ docker model version command failed" + exit 1 + fi + + - name: Pull the provided model and run it + run: | + MODEL="${{ github.event.inputs.test_model || 'ai/smollm2:360M-Q4_K_M' }}" + echo "Testing with model: $MODEL" + + # Test model pull + echo "Pulling model..." + sudo docker model pull "$MODEL" + + if [ $? -eq 0 ]; then + echo "✅ Model pull successful" + else + echo "❌ Model pull failed" + exit 1 + fi + + # Test basic model run (with timeout to avoid hanging) + echo "Testing docker model run..." + timeout 60s sudo docker model run "$MODEL" "Give me a fact about whales." || { + exit_code=$? + if [ $exit_code -eq 124 ]; then + echo "✅ Model run test completed (timed out as expected for non-interactive test)" + else + echo "❌ Model run failed with exit code: $exit_code" + exit 1 + fi + } + - name: Test model pull and run + run: | + MODEL="${{ github.event.inputs.test_model || 'ai/smollm2:360M-Q4_K_M' }}" + echo "Testing with model: $MODEL" + + # Test model pull + echo "Pulling model..." + sudo docker model pull "$MODEL" + + if [ $? -eq 0 ]; then + echo "✅ Model pull successful" + else + echo "❌ Model pull failed" + exit 1 + fi + + # Test basic model run (with timeout to avoid hanging) + echo "Testing docker model run..." + timeout 60s sudo docker model run "$MODEL" "Give me a fact about whales." || { + exit_code=$? + if [ $exit_code -eq 124 ]; then + echo "✅ Model run test completed (timed out as expected for non-interactive test)" + else + echo "❌ Model run failed with exit code: $exit_code" + exit 1 + fi + } + + - name: Test API endpoint + run: | + MODEL="${{ github.event.inputs.test_model || 'ai/smollm2:360M-Q4_K_M' }}" + echo "Testing API endpoint with model: $MODEL" + + # Test API call with curl + echo "Testing API call..." + RESPONSE=$(curl -s http://localhost:12434/engines/llama.cpp/v1/chat/completions \ + -H "Content-Type: application/json" \ + -d "{ + \"model\": \"$MODEL\", + \"messages\": [ + { + \"role\": \"user\", + \"content\": \"Say hello\" + } + ], + \"top_k\": 1, + \"temperature\": 0 + }") + + if [ $? -eq 0 ]; then + echo "✅ API call successful" + echo "Response received: $RESPONSE" + + # Check if response contains "hello" (case-insensitive) + if echo "$RESPONSE" | grep -qi "hello"; then + echo "✅ Response contains 'hello' (case-insensitive)" + else + echo "❌ Response does not contain 'hello'" + echo "Full response: $RESPONSE" + exit 1 + fi + else + echo "❌ API call failed" + exit 1 + fi + + - name: Test model cleanup + run: | + MODEL="${{ github.event.inputs.test_model || 'ai/smollm2:360M-Q4_K_M' }}" + + echo "Cleaning up test model..." + sudo docker model rm "$MODEL" || echo "Model removal failed or model not found" + + # Verify model was removed + echo "Verifying model cleanup..." + sudo docker model ls + + echo "✅ Model cleanup completed" + + - name: Report success + if: success() + run: | + echo "🎉 Docker Model Runner daily health check completed successfully!" + echo "All tests passed:" + echo " ✅ docker-model-plugin installation successful" + echo " ✅ docker model version command working" + echo " ✅ Model pull and run operations successful" + echo " ✅ API endpoint operations successful" + echo " ✅ Cleanup operations successful" +``` + +## Related pages + +- [Models and Compose](../compose/models-and-compose.md) diff --git a/content/manuals/ai/model-runner/get-started.md b/content/manuals/ai/model-runner/get-started.md new file mode 100644 index 000000000000..e4a66594bce6 --- /dev/null +++ b/content/manuals/ai/model-runner/get-started.md @@ -0,0 +1,223 @@ +--- +title: Get started with DMR +description: How to install, enable, and use Docker Model Runner to manage and run AI models. +weight: 10 +keywords: Docker, ai, model runner, setup, installation, getting started +--- + +Get started with [Docker Model Runner](_index.md). + +## Enable Docker Model Runner + +### Enable DMR in Docker Desktop + +1. In the settings view, go to the **Beta features** tab. +1. Select the **Enable Docker Model Runner** setting. +1. If you use Windows with a supported NVIDIA GPU, you also see and can select + **Enable GPU-backed inference**. +1. Optional: To enable TCP support, select **Enable host-side TCP support**. + 1. In the **Port** field, type the port you want to use. + 1. If you interact with Model Runner from a local frontend web app, in + **CORS Allows Origins**, select the origins that Model Runner should + accept requests from. An origin is the URL where your web app runs, for + example `http://localhost:3131`. + +You can now use the `docker model` command in the CLI and view and interact +with your local models in the **Models** tab in the Docker Desktop Dashboard. + +> [!IMPORTANT] +> +> For Docker Desktop versions 4.41 and earlier, this setting was under the +> **Experimental features** tab on the **Features in development** page. + +### Enable Docker Model Runner in Docker Engine + +1. Ensure you have installed [Docker Engine](/engine/install/). +1. Docker Model Runner is available as a package. To install it, run: + + {{< tabs >}} + {{< tab name="Ubuntu/Debian">}} + + ```console + $ sudo apt-get update + $ sudo apt-get install docker-model-plugin + ``` + + {{< /tab >}} + {{< tab name="RPM-base distributions">}} + + ```console + $ sudo dnf update + $ sudo dnf install docker-model-plugin + ``` + + {{< /tab >}} + {{< /tabs >}} + +1. Test the installation: + + ```console + $ docker model version + $ docker model run ai/smollm2 + ``` + +> [!NOTE] +> TCP support is enabled by default for Docker Engine on port `12434`. + +### Update Docker Model Runner in Docker Engine + +To update Docker Model Runner in Docker Engine, uninstall it with +[`docker model uninstall-runner`](/reference/cli/docker/model/uninstall-runner/) +then reinstall it: + +```console +docker model uninstall-runner --images && docker model install-runner +``` + +> [!NOTE] +> With the above command, local models are preserved. +> To delete the models during the upgrade, add the `--models` option to the +> `uninstall-runner` command. + +## Pull a model + +Models are cached locally. + +> [!NOTE] +> +> When you use the Docker CLI, you can also pull models directly from +> [HuggingFace](https://huggingface.co/). + +{{< tabs group="release" >}} +{{< tab name="From Docker Desktop">}} + +1. Select **Models** and select the **Docker Hub** tab. +1. Find the model you want and select **Pull**. + +![Screenshot showing the Docker Hub view.](./images/dmr-catalog.png) + +{{< /tab >}} +{{< tab name="From the Docker CLI">}} + +Use the [`docker model pull` command](/reference/cli/docker/model/pull/). +For example: + +```bash {title="Pulling from Docker Hub"} +docker model pull ai/smollm2:360M-Q4_K_M +``` + +```bash {title="Pulling from HuggingFace"} +docker model pull hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF +``` + +{{< /tab >}} +{{< /tabs >}} + +## Run a model + +{{< tabs group="release" >}} +{{< tab name="From Docker Desktop">}} + +1. Select **Models** and select the **Local** tab. +1. Select the play button. The interactive chat screen opens. + +![Screenshot showing the Local view.](./images/dmr-run.png) + +{{< /tab >}} +{{< tab name="From the Docker CLI" >}} + +Use the [`docker model run` command](/reference/cli/docker/model/run/). + +{{< /tab >}} +{{< /tabs >}} + +## Configure a model + +You can configure a model, such as the its maximum token limit and more, +use Docker Compose. See [Models and Compose - Model configuration options](../compose/models-and-compose.md#model-configuration-options). + +## Publish a model + +> [!NOTE] +> +> This works for any Container Registry supporting OCI Artifacts, not only +> Docker Hub. + +You can tag existing models with a new name and publish them under a different +namespace and repository: + +```console +# Tag a pulled model under a new name +$ docker model tag ai/smollm2 myorg/smollm2 + +# Push it to Docker Hub +$ docker model push myorg/smollm2 +``` + +For more details, see the [`docker model tag`](/reference/cli/docker/model/tag) +and [`docker model push`](/reference/cli/docker/model/push) command +documentation. + +You can also package a model file in GGUF format as an OCI Artifact and publish +it to Docker Hub. + +```console +# Download a model file in GGUF format, for example from HuggingFace +$ curl -L -o model.gguf https://huggingface.co/TheBloke/Mistral-7B-v0.1-GGUF/resolve/main/mistral-7b-v0.1.Q4_K_M.gguf + +# Package it as OCI Artifact and push it to Docker Hub +$ docker model package --gguf "$(pwd)/model.gguf" --push myorg/mistral-7b-v0.1:Q4_K_M +``` + +For more details, see the +[`docker model package`](/reference/cli/docker/model/package/) command +documentation. + +## Troubleshooting + +### Display the logs + +To troubleshoot issues, display the logs: + +{{< tabs group="release" >}} +{{< tab name="From Docker Desktop">}} + +Select **Models** and select the **Logs** tab. + +![Screenshot showing the Models view.](./images/dmr-logs.png) + +{{< /tab >}} +{{< tab name="From the Docker CLI">}} + +Use the [`docker model logs` command](/reference/cli/docker/model/logs/). + +{{< /tab >}} +{{< /tabs >}} + +### Inspect requests and responses + +Inspecting requests and responses helps you diagnose model-related issues. +For example, you can evaluate context usage to verify you stay within the model's context +window or display the full body of a request to control the parameters you are passing to your models +when developing with a framework. + +In Docker Desktop, to inspect the requests and responses for each model: + +1. Select **Models** and select the **Requests** tab. This view displays all the requests to all models: + - The time the request was sent. + - The model name and version + - The prompt/request + - The context usage + - The time it took for the response to be generated. +2. Select one of the requests to display further details: + - In the **Overview** tab, view the token usage, response metadata and generation speed, and the actual prompt and response. + - In the **Request** and **Response** tabs, view the full JSON payload of the request and the response. + +> [!NOTE] +> You can also display the requests for a specific model when you select a model and then select the **Requests** tab. + +## Related pages + +- [Interact with your model programmatically](./api-reference.md) +- [Models and Compose](../compose/models-and-compose.md) +- [Docker Model Runner cli reference documentation](/reference/cli/docker/model) \ No newline at end of file diff --git a/content/manuals/ai/model-runner/setup.md b/content/manuals/ai/model-runner/setup.md new file mode 100644 index 000000000000..e69de29bb2d1