Skip to content

Commit 46dad85

Browse files
authored
docs: update KVBM diagram and bump container image tags to 1.0.0 (#7365)
Signed-off-by: Dan Gil <dagil@nvidia.com>
1 parent 6b62df6 commit 46dad85

File tree

13 files changed

+122
-29
lines changed

13 files changed

+122
-29
lines changed

docs/assets/img/architecture.png

188 KB
Loading

docs/assets/img/kvbm-components.svg

Lines changed: 97 additions & 4 deletions
Loading

docs/backends/trtllm/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ docker compose -f deploy/docker-compose.yml up -d
4444
**Step 2 (host terminal):** Pull and run the prebuilt container:
4545

4646
```bash
47-
DYNAMO_VERSION=0.9.0
47+
DYNAMO_VERSION=1.0.0
4848
docker pull nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:$DYNAMO_VERSION
4949
docker run --gpus all -it --network host --ipc host \
5050
nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:$DYNAMO_VERSION

docs/backends/trtllm/multinode/trtllm-multinode-examples.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -80,7 +80,7 @@ following environment variables based:
8080
```bash
8181
# NOTE: IMAGE must be set manually for now
8282
# Use the prebuilt container from NGC (see ../README.md#quick-start):
83-
# export IMAGE="nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:0.9.0"
83+
# export IMAGE="nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:1.0.0"
8484
# Or build a custom one (see ../trtllm-building-custom-container.md)
8585
# Or you can also download the image to shared storage and point
8686
# IMAGE to the local path.

docs/benchmarks/kv-router-ab-testing.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -121,7 +121,7 @@ spec:
121121
replicas: 1
122122
extraPodSpec:
123123
mainContainer:
124-
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.9.0
124+
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.0
125125
env:
126126
- name: POD_UID
127127
valueFrom:
@@ -146,7 +146,7 @@ spec:
146146
values:
147147
- gpu-h100-sxm # Adjust to your GPU node type
148148
mainContainer:
149-
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.9.0
149+
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.0
150150
workingDir: /workspace
151151
command:
152152
- /bin/sh
@@ -212,7 +212,7 @@ spec:
212212
replicas: 1
213213
extraPodSpec:
214214
mainContainer:
215-
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.9.0
215+
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.0
216216
env:
217217
- name: POD_UID
218218
valueFrom:
@@ -240,7 +240,7 @@ spec:
240240
values:
241241
- gpu-h100-sxm # Adjust to your GPU node type
242242
mainContainer:
243-
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.9.0
243+
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.0
244244
workingDir: /workspace
245245
command:
246246
- /bin/sh
@@ -438,7 +438,7 @@ spec:
438438
restartPolicy: Never
439439
containers:
440440
- name: benchmark
441-
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.9.0
441+
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.0
442442
securityContext:
443443
runAsUser: 0 # Required: apt-get and pip install need root in ephemeral benchmark pod
444444
command:

docs/components/profiler/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ metadata:
3737
spec:
3838
model: "Qwen/Qwen3-0.6B"
3939
backend: vllm
40-
image: "nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.9.0"
40+
image: "nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.0"
4141

4242
workload:
4343
isl: 3000 # Average input sequence length

docs/components/profiler/profiler-guide.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -200,7 +200,7 @@ Each DGDR requires a container image for profiling and deployment:
200200

201201
```yaml
202202
spec:
203-
image: "nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.9.0"
203+
image: "nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.0"
204204
```
205205

206206
#### Quick Start: Deploy with DGDR
@@ -371,7 +371,7 @@ metadata:
371371
spec:
372372
model: "Qwen/Qwen3-0.6B"
373373
backend: vllm
374-
image: "nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.9.0"
374+
image: "nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.0"
375375
376376
searchStrategy: rapid # or thorough
377377
autoApply: true

docs/components/router/router-examples.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -130,7 +130,7 @@ spec:
130130
value: "16"
131131
extraPodSpec:
132132
mainContainer:
133-
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.0
133+
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.0
134134
```
135135
136136
### Alternative: Using Command Args in K8s
@@ -140,7 +140,7 @@ You can also pass CLI arguments directly in the container command:
140140
```yaml
141141
extraPodSpec:
142142
mainContainer:
143-
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.0
143+
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.0
144144
command:
145145
- /bin/sh
146146
- -c

docs/features/disaggregated-serving/README.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@ aiconfigurator cli default \
7575
--tpot 25 \
7676
--backend vllm \
7777
--backend-version 0.12.0 \
78-
--generator-dynamo-version 0.8.0 \
78+
--generator-dynamo-version 1.0.0 \
7979
--generator-set K8sConfig.k8s_namespace=$YOUR_NAMESPACE \
8080
--generator-set K8sConfig.k8s_pvc_name=$YOUR_PVC \
8181
--save-dir ./results_vllm
@@ -272,7 +272,7 @@ spec:
272272
value: /opt/models
273273
extraPodSpec:
274274
mainContainer:
275-
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.8.0
275+
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.0
276276
imagePullPolicy: IfNotPresent
277277
278278
VLLMWorker:
@@ -292,7 +292,7 @@ spec:
292292
value: /opt/models
293293
extraPodSpec:
294294
mainContainer:
295-
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.8.0
295+
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.0
296296
workingDir: /workspace
297297
imagePullPolicy: IfNotPresent
298298
command:
@@ -506,7 +506,7 @@ spec:
506506
value: /opt/models
507507
extraPodSpec:
508508
mainContainer:
509-
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.8.0
509+
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.0
510510
imagePullPolicy: IfNotPresent
511511
512512
VLLMPrefillWorker:
@@ -533,7 +533,7 @@ spec:
533533
value: "0"
534534
extraPodSpec:
535535
mainContainer:
536-
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.8.0
536+
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.0
537537
workingDir: /workspace
538538
imagePullPolicy: IfNotPresent
539539
securityContext:
@@ -581,7 +581,7 @@ spec:
581581
value: "0"
582582
extraPodSpec:
583583
mainContainer:
584-
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.8.0
584+
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.0
585585
workingDir: /workspace
586586
imagePullPolicy: IfNotPresent
587587
securityContext:

docs/getting-started/quickstart.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,13 +20,13 @@ Containers have all dependencies pre-installed. No setup required.
2020

2121
```bash
2222
# SGLang
23-
docker run --gpus all --network host --rm -it nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.8.1
23+
docker run --gpus all --network host --rm -it nvcr.io/nvidia/ai-dynamo/sglang-runtime:1.0.0
2424

2525
# TensorRT-LLM
26-
docker run --gpus all --network host --rm -it nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:0.8.1
26+
docker run --gpus all --network host --rm -it nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:1.0.0
2727

2828
# vLLM
29-
docker run --gpus all --network host --rm -it nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.8.1
29+
docker run --gpus all --network host --rm -it nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.0
3030
```
3131

3232
<Tip>

0 commit comments

Comments
 (0)