You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -80,6 +80,36 @@ Common configuration options include:
80
80
For example, if you use llama.cpp, you can pass any of [the available parameters](https://github.com/ggml-org/llama.cpp/blob/master/tools/server/README.md).
81
81
- Platform-specific options may also be available via extension attributes `x-*`
82
82
83
+
> [!TIP]
84
+
> See more example in the [Common runtime configurations](#common-runtime-configurations) section.
85
+
86
+
### Alternative configuration with provider services
87
+
88
+
> [!IMPORTANT]
89
+
>
90
+
> This approach is deprecated. Use the [`models` top-level element](#basic-model-definition) instead.
91
+
92
+
You can also use the `provider` service type, which allows you to declare platform capabilities required by your application.
93
+
For AI models, you can use the `model` type to declare model dependencies.
94
+
95
+
To define a model provider:
96
+
97
+
```yaml
98
+
services:
99
+
chat:
100
+
image: my-chat-app
101
+
depends_on:
102
+
- ai_runner
103
+
104
+
ai_runner:
105
+
provider:
106
+
type: model
107
+
options:
108
+
model: ai/smollm2
109
+
context-size: 1024
110
+
runtime-flags: "--no-prefill-assistant"
111
+
```
112
+
83
113
## Service model binding
84
114
85
115
Services can reference models in two ways: short syntax and long syntax.
@@ -166,34 +196,6 @@ Docker Model Runner will:
166
196
- Provide endpoint URLs for accessing the model
167
197
- Inject environment variables into the service
168
198
169
-
#### Alternative configuration with provider services
170
-
171
-
> [!TIP]
172
-
>
173
-
> This approach is deprecated. Use the [`models` top-level element](#basic-model-definition) instead.
174
-
175
-
You can also use the `provider` service type, which allows you to declare platform capabilities required by your application.
176
-
For AI models, you can use the `model` type to declare model dependencies.
177
-
178
-
To define a model provider:
179
-
180
-
```yaml
181
-
services:
182
-
chat:
183
-
image: my-chat-app
184
-
depends_on:
185
-
- ai_runner
186
-
187
-
ai_runner:
188
-
provider:
189
-
type: model
190
-
options:
191
-
model: ai/smollm2
192
-
context-size: 1024
193
-
runtime-flags: "--no-prefill-assistant"
194
-
```
195
-
196
-
197
199
### Cloud providers
198
200
199
201
The same Compose file can run on cloud providers that support Compose models:
@@ -220,6 +222,144 @@ Cloud providers might:
220
222
- Provide additional monitoring and logging capabilities
221
223
- Handle model versioning and updates automatically
222
224
225
+
## Common runtime configurations
226
+
227
+
Below are some example configurations for various use cases.
228
+
229
+
### Development
230
+
231
+
```yaml
232
+
services:
233
+
app:
234
+
image: app
235
+
models:
236
+
dev_model:
237
+
endpoint_var: DEV_URL
238
+
model_var: DEV_MODEL
239
+
240
+
models:
241
+
dev_model:
242
+
model: ai/model
243
+
context_size: 4096
244
+
runtime_flags:
245
+
- "--verbose" # Set verbosity level to infinity
246
+
- "--verbose-prompt" # Print a verbose prompt before generation
247
+
- "--log-prefix" # Enable prefix in log messages
248
+
- "--log-timestamps" # Enable timestamps in log messages
249
+
- "--log-colors" # Enable colored logging
250
+
```
251
+
252
+
### Conservative with disabled reasoning
253
+
254
+
```yaml
255
+
services:
256
+
app:
257
+
image: app
258
+
models:
259
+
conservative_model:
260
+
endpoint_var: CONSERVATIVE_URL
261
+
model_var: CONSERVATIVE_MODEL
262
+
263
+
models:
264
+
conservative_model:
265
+
model: ai/model
266
+
context_size: 4096
267
+
runtime_flags:
268
+
- "--temp" # Temperature
269
+
- "0.1"
270
+
- "--top-k" # Top-k sampling
271
+
- "1"
272
+
- "--reasoning-budget" # Disable reasoning
273
+
- "0"
274
+
```
275
+
276
+
### Creative with high randomness
277
+
278
+
```yaml
279
+
services:
280
+
app:
281
+
image: app
282
+
models:
283
+
creative_model:
284
+
endpoint_var: CREATIVE_URL
285
+
model_var: CREATIVE_MODEL
286
+
287
+
models:
288
+
creative_model:
289
+
model: ai/model
290
+
context_size: 4096
291
+
runtime_flags:
292
+
- "--temp" # Temperature
293
+
- "1"
294
+
- "--top-p" # Top-p sampling
295
+
- "0.9"
296
+
```
297
+
298
+
### Highly deterministic
299
+
300
+
```yaml
301
+
services:
302
+
app:
303
+
image: app
304
+
models:
305
+
deterministic_model:
306
+
endpoint_var: DET_URL
307
+
model_var: DET_MODEL
308
+
309
+
models:
310
+
deterministic_model:
311
+
model: ai/model
312
+
context_size: 4096
313
+
runtime_flags:
314
+
- "--temp" # Temperature
315
+
- "0"
316
+
- "--top-k" # Top-k sampling
317
+
- "1"
318
+
```
319
+
320
+
### Concurrent processing
321
+
322
+
```yaml
323
+
services:
324
+
app:
325
+
image: app
326
+
models:
327
+
concurrent_model:
328
+
endpoint_var: CONCURRENT_URL
329
+
model_var: CONCURRENT_MODEL
330
+
331
+
models:
332
+
concurrent_model:
333
+
model: ai/model
334
+
context_size: 2048
335
+
runtime_flags:
336
+
- "--threads" # Number of threads to use during generation
0 commit comments