diff --git a/content/manuals/ai/model-runner/_index.md b/content/manuals/ai/model-runner/_index.md index bda21d4d1980..b8491af037ec 100644 --- a/content/manuals/ai/model-runner/_index.md +++ b/content/manuals/ai/model-runner/_index.md @@ -385,3 +385,7 @@ The Docker Model CLI currently lacks consistent support for specifying models by ## Share feedback Thanks for trying out Docker Model Runner. Give feedback or report any bugs you may find through the **Give feedback** link next to the **Enable Docker Model Runner** setting. + +## Related pages + +- [Use Model Runner with Compose](/manuals/compose/how-tos/model-runner.md) diff --git a/content/manuals/compose/how-tos/model-runner.md b/content/manuals/compose/how-tos/model-runner.md index 2a7fca43ca83..d64886b61f49 100644 --- a/content/manuals/compose/how-tos/model-runner.md +++ b/content/manuals/compose/how-tos/model-runner.md @@ -40,15 +40,33 @@ services: type: model options: model: ai/smollm2 + context-size: 1024 + runtime-flags: "--no-prefill-assistant" ``` -Notice the dedicated `provider` attribute in the `ai_runner` service. -This attribute specifies that the service is a model provider and lets you define options such as the name of the model to be used. - -There is also a `depends_on` attribute in the `chat` service. -This attribute specifies that the `chat` service depends on the `ai_runner` service. -This means that the `ai_runner` service will be started before the `chat` service to allow injection of model information to the `chat` service. - +Notice the following: + +In the `ai_runner` service: + +- `provider.type`: Specifies that the service is a `model` provider. +- `provider.options`: Specifies the options of the mode: + - We want to use `ai/smollm2` model. + - We set the context size to `1024` tokens. + + > [!NOTE] + > Each model has its own maximum context size. When increasing the context length, + > consider your hardware constraints. In general, try to use the smallest context size + > possible for your use case. + - We pass the llama.cpp server `--no-prefill-assistant` parameter, + see [the available parameters](https://github.com/ggml-org/llama.cpp/blob/master/tools/server/README.md). + + + +In the `chat` service: + +- `depends_on` specifies that the `chat` service depends on the `ai_runner` service. The + `ai_runner` service will be started before the `chat` service, to allow injection of model information to the `chat` service. + ## How it works During the `docker compose up` process, Docker Model Runner automatically pulls and runs the specified model. @@ -61,6 +79,6 @@ In the example above, the `chat` service receives 2 environment variables prefix This lets the `chat` service to interact with the model and use it for its own purposes. -## Reference +## Related pages - [Docker Model Runner documentation](/manuals/ai/model-runner.md)