Skip to content

Commit 213adb3

Browse files
authored
dmr: add common compose configs (#23127)
1 parent 9e6d2e9 commit 213adb3

File tree

1 file changed

+168
-28
lines changed

1 file changed

+168
-28
lines changed

content/manuals/ai/compose/models-and-compose.md

Lines changed: 168 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,36 @@ Common configuration options include:
8080
For example, if you use llama.cpp, you can pass any of [the available parameters](https://github.com/ggml-org/llama.cpp/blob/master/tools/server/README.md).
8181
- Platform-specific options may also be available via extension attributes `x-*`
8282

83+
> [!TIP]
84+
> See more example in the [Common runtime configurations](#common-runtime-configurations) section.
85+
86+
### Alternative configuration with provider services
87+
88+
> [!IMPORTANT]
89+
>
90+
> This approach is deprecated. Use the [`models` top-level element](#basic-model-definition) instead.
91+
92+
You can also use the `provider` service type, which allows you to declare platform capabilities required by your application.
93+
For AI models, you can use the `model` type to declare model dependencies.
94+
95+
To define a model provider:
96+
97+
```yaml
98+
services:
99+
chat:
100+
image: my-chat-app
101+
depends_on:
102+
- ai_runner
103+
104+
ai_runner:
105+
provider:
106+
type: model
107+
options:
108+
model: ai/smollm2
109+
context-size: 1024
110+
runtime-flags: "--no-prefill-assistant"
111+
```
112+
83113
## Service model binding
84114

85115
Services can reference models in two ways: short syntax and long syntax.
@@ -166,34 +196,6 @@ Docker Model Runner will:
166196
- Provide endpoint URLs for accessing the model
167197
- Inject environment variables into the service
168198

169-
#### Alternative configuration with provider services
170-
171-
> [!TIP]
172-
>
173-
> This approach is deprecated. Use the [`models` top-level element](#basic-model-definition) instead.
174-
175-
You can also use the `provider` service type, which allows you to declare platform capabilities required by your application.
176-
For AI models, you can use the `model` type to declare model dependencies.
177-
178-
To define a model provider:
179-
180-
```yaml
181-
services:
182-
chat:
183-
image: my-chat-app
184-
depends_on:
185-
- ai_runner
186-
187-
ai_runner:
188-
provider:
189-
type: model
190-
options:
191-
model: ai/smollm2
192-
context-size: 1024
193-
runtime-flags: "--no-prefill-assistant"
194-
```
195-
196-
197199
### Cloud providers
198200

199201
The same Compose file can run on cloud providers that support Compose models:
@@ -220,6 +222,144 @@ Cloud providers might:
220222
- Provide additional monitoring and logging capabilities
221223
- Handle model versioning and updates automatically
222224

225+
## Common runtime configurations
226+
227+
Below are some example configurations for various use cases.
228+
229+
### Development
230+
231+
```yaml
232+
services:
233+
app:
234+
image: app
235+
models:
236+
dev_model:
237+
endpoint_var: DEV_URL
238+
model_var: DEV_MODEL
239+
240+
models:
241+
dev_model:
242+
model: ai/model
243+
context_size: 4096
244+
runtime_flags:
245+
- "--verbose" # Set verbosity level to infinity
246+
- "--verbose-prompt" # Print a verbose prompt before generation
247+
- "--log-prefix" # Enable prefix in log messages
248+
- "--log-timestamps" # Enable timestamps in log messages
249+
- "--log-colors" # Enable colored logging
250+
```
251+
252+
### Conservative with disabled reasoning
253+
254+
```yaml
255+
services:
256+
app:
257+
image: app
258+
models:
259+
conservative_model:
260+
endpoint_var: CONSERVATIVE_URL
261+
model_var: CONSERVATIVE_MODEL
262+
263+
models:
264+
conservative_model:
265+
model: ai/model
266+
context_size: 4096
267+
runtime_flags:
268+
- "--temp" # Temperature
269+
- "0.1"
270+
- "--top-k" # Top-k sampling
271+
- "1"
272+
- "--reasoning-budget" # Disable reasoning
273+
- "0"
274+
```
275+
276+
### Creative with high randomness
277+
278+
```yaml
279+
services:
280+
app:
281+
image: app
282+
models:
283+
creative_model:
284+
endpoint_var: CREATIVE_URL
285+
model_var: CREATIVE_MODEL
286+
287+
models:
288+
creative_model:
289+
model: ai/model
290+
context_size: 4096
291+
runtime_flags:
292+
- "--temp" # Temperature
293+
- "1"
294+
- "--top-p" # Top-p sampling
295+
- "0.9"
296+
```
297+
298+
### Highly deterministic
299+
300+
```yaml
301+
services:
302+
app:
303+
image: app
304+
models:
305+
deterministic_model:
306+
endpoint_var: DET_URL
307+
model_var: DET_MODEL
308+
309+
models:
310+
deterministic_model:
311+
model: ai/model
312+
context_size: 4096
313+
runtime_flags:
314+
- "--temp" # Temperature
315+
- "0"
316+
- "--top-k" # Top-k sampling
317+
- "1"
318+
```
319+
320+
### Concurrent processing
321+
322+
```yaml
323+
services:
324+
app:
325+
image: app
326+
models:
327+
concurrent_model:
328+
endpoint_var: CONCURRENT_URL
329+
model_var: CONCURRENT_MODEL
330+
331+
models:
332+
concurrent_model:
333+
model: ai/model
334+
context_size: 2048
335+
runtime_flags:
336+
- "--threads" # Number of threads to use during generation
337+
- "8"
338+
- "--mlock" # Lock memory to prevent swapping
339+
```
340+
341+
### Rich vocabulary model
342+
343+
```yaml
344+
services:
345+
app:
346+
image: app
347+
models:
348+
rich_vocab_model:
349+
endpoint_var: RICH_VOCAB_URL
350+
model_var: RICH_VOCAB_MODEL
351+
352+
models:
353+
rich_vocab_model:
354+
model: ai/model
355+
context_size: 4096
356+
runtime_flags:
357+
- "--temp" # Temperature
358+
- "0.1"
359+
- "--top-p" # Top-p sampling
360+
- "0.9"
361+
```
362+
223363
## Reference
224364

225365
- [`models` top-level element](/reference/compose-file/models.md)

0 commit comments

Comments
 (0)