Skip to content

Commit 8bf6cd0

Browse files
committed
Add Docker Model Runner documentation
For configuration, IDE integrations, inference engines, and Open WebUI Signed-off-by: Eric Curtin <[email protected]>
1 parent ccd16bb commit 8bf6cd0

File tree

8 files changed

+1449
-85
lines changed

8 files changed

+1449
-85
lines changed

content/manuals/ai/compose/models-and-compose.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,7 @@ Common configuration options include:
7777
> as small as feasible for your specific needs.
7878

7979
- `runtime_flags`: A list of raw command-line flags passed to the inference engine when the model is started.
80-
For example, if you use llama.cpp, you can pass any of [the available parameters](https://github.com/ggml-org/llama.cpp/blob/master/tools/server/README.md).
80+
See [Configuration options](/manuals/ai/model-runner/configuration.md) for commonly used parameters and examples.
8181
- Platform-specific options may also be available via extension attributes `x-*`
8282

8383
> [!TIP]
@@ -364,5 +364,7 @@ services:
364364

365365
- [`models` top-level element](/reference/compose-file/models.md)
366366
- [`models` attribute](/reference/compose-file/services.md#models)
367-
- [Docker Model Runner documentation](/manuals/ai/model-runner.md)
368-
- [Compose Model Runner documentation](/manuals/ai/compose/models-and-compose.md)
367+
- [Docker Model Runner documentation](/manuals/ai/model-runner/_index.md)
368+
- [Configuration options](/manuals/ai/model-runner/configuration.md) - Context size and runtime parameters
369+
- [Inference engines](/manuals/ai/model-runner/inference-engines.md) - llama.cpp and vLLM details
370+
- [API reference](/manuals/ai/model-runner/api-reference.md) - OpenAI and Ollama-compatible APIs

content/manuals/ai/model-runner/_index.md

Lines changed: 34 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ params:
66
group: AI
77
weight: 30
88
description: Learn how to use Docker Model Runner to manage and run AI models.
9-
keywords: Docker, ai, model runner, docker desktop, docker engine, llm, openai, llama.cpp, vllm, cpu, nvidia, cuda, amd, rocm, vulkan
9+
keywords: Docker, ai, model runner, docker desktop, docker engine, llm, openai, ollama, llama.cpp, vllm, cpu, nvidia, cuda, amd, rocm, vulkan, cline, continue, cursor
1010
aliases:
1111
- /desktop/features/model-runner/
1212
- /model-runner/
@@ -21,7 +21,7 @@ large language models (LLMs) and other AI models directly from Docker Hub or any
2121
OCI-compliant registry.
2222

2323
With seamless integration into Docker Desktop and Docker
24-
Engine, you can serve models via OpenAI-compatible APIs, package GGUF files as
24+
Engine, you can serve models via OpenAI and Ollama-compatible APIs, package GGUF files as
2525
OCI Artifacts, and interact with models from both the command line and graphical
2626
interface.
2727

@@ -33,10 +33,13 @@ with AI models locally.
3333
## Key features
3434

3535
- [Pull and push models to and from Docker Hub](https://hub.docker.com/u/ai)
36-
- Serve models on OpenAI-compatible APIs for easy integration with existing apps
37-
- Support for both llama.cpp and vLLM inference engines (vLLM currently supported on Linux x86_64/amd64 with NVIDIA GPUs only)
36+
- Serve models on [OpenAI and Ollama-compatible APIs](api-reference.md) for easy integration with existing apps
37+
- Support for both [llama.cpp and vLLM inference engines](inference-engines.md) (vLLM on Linux x86_64/amd64 and Windows WSL2 with NVIDIA GPUs)
3838
- Package GGUF and Safetensors files as OCI Artifacts and publish them to any Container Registry
3939
- Run and interact with AI models directly from the command line or from the Docker Desktop GUI
40+
- [Connect to AI coding tools](ide-integrations.md) like Cline, Continue, Cursor, and Aider
41+
- [Configure context size and model parameters](configuration.md) to tune performance
42+
- [Set up Open WebUI](openwebui-integration.md) for a ChatGPT-like web interface
4043
- Manage local models and display logs
4144
- Display prompt and response details
4245
- Conversational context support for multi-turn interactions
@@ -82,9 +85,28 @@ locally. They load into memory only at runtime when a request is made, and
8285
unload when not in use to optimize resources. Because models can be large, the
8386
initial pull may take some time. After that, they're cached locally for faster
8487
access. You can interact with the model using
85-
[OpenAI-compatible APIs](api-reference.md).
88+
[OpenAI and Ollama-compatible APIs](api-reference.md).
8689

87-
Docker Model Runner supports both [llama.cpp](https://github.com/ggerganov/llama.cpp) and [vLLM](https://github.com/vllm-project/vllm) as inference engines, providing flexibility for different model formats and performance requirements. For more details, see the [Docker Model Runner repository](https://github.com/docker/model-runner).
90+
### Inference engines
91+
92+
Docker Model Runner supports two inference engines:
93+
94+
| Engine | Best for | Model format |
95+
|--------|----------|--------------|
96+
| [llama.cpp](inference-engines.md#llamacpp) | Local development, resource efficiency | GGUF (quantized) |
97+
| [vLLM](inference-engines.md#vllm) | Production, high throughput | Safetensors |
98+
99+
llama.cpp is the default engine and works on all platforms. vLLM requires NVIDIA GPUs and is supported on Linux x86_64 and Windows with WSL2. See [Inference engines](inference-engines.md) for detailed comparison and setup.
100+
101+
### Context size
102+
103+
Models have a configurable context size (context length) that determines how many tokens they can process. The default varies by model but is typically 2,048-8,192 tokens. You can adjust this per-model:
104+
105+
```console
106+
$ docker model configure --context-size 8192 ai/qwen2.5-coder
107+
```
108+
109+
See [Configuration options](configuration.md) for details on context size and other parameters.
88110

89111
> [!TIP]
90112
>
@@ -120,4 +142,9 @@ Thanks for trying out Docker Model Runner. To report bugs or request features, [
120142

121143
## Next steps
122144

123-
[Get started with DMR](get-started.md)
145+
- [Get started with DMR](get-started.md) - Enable DMR and run your first model
146+
- [API reference](api-reference.md) - OpenAI and Ollama-compatible API documentation
147+
- [Configuration options](configuration.md) - Context size and runtime parameters
148+
- [Inference engines](inference-engines.md) - llama.cpp and vLLM details
149+
- [IDE integrations](ide-integrations.md) - Connect Cline, Continue, Cursor, and more
150+
- [Open WebUI integration](openwebui-integration.md) - Set up a web chat interface

0 commit comments

Comments
 (0)