@@ -8,60 +8,45 @@ Complete examples for profiling with DGDRs.
88
99## DGDR Examples
1010
11- ### Dense Model: AIPerf on Real Engines
11+ ### Dense Model: Rapid
1212
13- Standard online profiling with real GPU measurements :
13+ Fast profiling ( ~ 30 seconds) :
1414
1515``` yaml
1616apiVersion : nvidia.com/v1beta1
1717kind : DynamoGraphDeploymentRequest
1818metadata :
19- name : vllm-dense-online
19+ name : qwen-0-6b
2020spec :
2121 model : " Qwen/Qwen3-0.6B"
22- backend : vllm
23- image : " nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.9.0"
24-
25- workload :
26- isl : 3000
27- osl : 150
28-
29- sla :
30- ttft : 200.0
31- itl : 20.0
32-
33- autoApply : true
22+ image : " nvcr.io/nvidia/ai-dynamo/dynamo-frontend:1.0.0"
3423` ` `
3524
36- ### Dense Model: AI Configurator Simulation
25+ ### Dense Model: Thorough
3726
38- Fast offline profiling (~30 seconds, TensorRT-LLM only) :
27+ Profiling with real GPU measurements :
3928
4029` ` ` yaml
4130apiVersion : nvidia.com/v1beta1
4231kind : DynamoGraphDeploymentRequest
4332metadata :
44- name : trtllm-aic-offline
33+ name : vllm-dense-online
4534spec :
46- model : " Qwen/Qwen3-32B"
47- backend : trtllm
48- image : " nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:0.9.0"
49-
50- workload :
51- isl : 4000
52- osl : 500
53-
54- sla :
55- ttft : 300.0
56- itl : 10.0
57-
58- autoApply : true
35+ model : " Qwen/Qwen3-0.6B"
36+ backend : vllm
37+ image : " nvcr.io/nvidia/ai-dynamo/dynamo-frontend:1.0.0"
38+ searchStrategy : thorough
5939` ` `
6040
6141### MoE Model
6242
6343Multi-node MoE profiling with SGLang:
6444
45+ > [!IMPORTANT]
46+ > The PVC referenced by ` modelCache.pvcName` must already exist in the same namespace and contain
47+ > the model weights at the specified `pvcModelPath`. The DGDR controller does not create or
48+ > populate the PVC — it only mounts it into the profiling job and deployed workers.
49+
6550` ` ` yaml
6651apiVersion: nvidia.com/v1beta1
6752kind: DynamoGraphDeploymentRequest
@@ -70,53 +55,138 @@ metadata:
7055spec:
7156 model: "deepseek-ai/DeepSeek-R1"
7257 backend: sglang
73- image : " nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.9.0"
74-
75- workload :
76- isl : 2048
77- osl : 512
78-
79- sla :
80- ttft : 300.0
81- itl : 25.0
58+ image: "nvcr.io/nvidia/ai-dynamo/dynamo-frontend:1.0.0"
8259
8360 hardware:
8461 numGpusPerNode: 8
8562
86- autoApply : true
63+ modelCache:
64+ pvcName: "model-cache"
65+ pvcModelPath: "deepseek-r1" # path within the PVC
8766` ` `
8867
89- ### Using Existing DGD Config (ConfigMap)
68+ # ## Private Model
9069
91- Reference a custom DGD configuration via ConfigMap:
70+ For gated or private HuggingFace models, pass your token via an environment variable injected
71+ into the profiling job. Create the secret first :
9272
9373` ` ` bash
94- # Create ConfigMap from your DGD config file
95- kubectl create configmap deepseek-r1-config \
96- --from-file=/path/to/your/disagg.yaml \
97- --namespace $NAMESPACE \
98- --dry-run=client -o yaml | kubectl apply -f -
74+ kubectl create secret generic hf-token-secret \
75+ --from-literal=HF_TOKEN="${HF_TOKEN}" \
76+ -n ${NAMESPACE}
9977` ` `
10078
79+ Then reference it in your DGDR :
80+
10181` ` ` yaml
10282apiVersion: nvidia.com/v1beta1
10383kind: DynamoGraphDeploymentRequest
10484metadata:
105- name : deepseek-r1
85+ name: llama-private
10686spec:
107- model : deepseek-ai/DeepSeek-R1
108- backend : sglang
109- image : " nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.9.0"
87+ model: "meta-llama/Llama-3.1-8B-Instruct"
88+ image: "nvcr.io/nvidia/ai-dynamo/dynamo-frontend:1.0.0"
89+
90+ overrides:
91+ profilingJob:
92+ template:
93+ spec:
94+ containers: [] # required placeholder; leave empty to inherit defaults
95+ initContainers:
96+ - name: profiler
97+ env:
98+ - name: HF_TOKEN
99+ valueFrom:
100+ secretKeyRef:
101+ name: hf-token-secret
102+ key: HF_TOKEN
103+ ` ` `
104+
105+ # ## Custom SLA Targets
106+
107+ Control how the profiler optimizes your deployment by specifying latency targets and workload
108+ characteristics.
109+
110+ **Explicit TTFT + ITL targets** (default mode):
111+
112+ ` ` ` yaml
113+ apiVersion: nvidia.com/v1beta1
114+ kind: DynamoGraphDeploymentRequest
115+ metadata:
116+ name: low-latency-dense
117+ spec:
118+ model: "Qwen/Qwen3-0.6B"
119+ image: "nvcr.io/nvidia/ai-dynamo/dynamo-frontend:1.0.0"
120+
121+ sla:
122+ ttft: 500 # Time To First Token target in milliseconds
123+ itl: 20 # Inter-Token Latency target in milliseconds
110124
111125 workload:
112- isl : 4000
113- osl : 500
126+ isl: 2000 # expected input sequence length (tokens)
127+ osl: 500 # expected output sequence length (tokens)
128+ ` ` `
114129
130+ **End-to-end latency target** (alternative to ttft+itl):
131+
132+ ` ` ` yaml
133+ spec:
134+ ...
135+ sla:
136+ e2eLatency: 10000 # total request latency budget in milliseconds
137+ ` ` `
138+
139+ **Optimization objective without explicit targets** (maximize throughput or minimize latency):
140+
141+ ` ` ` yaml
142+ spec:
143+ ...
115144 sla:
116- ttft : 300
117- itl : 10
145+ optimizationType: throughput # or: latency
146+ ` ` `
147+
148+ # ## Overrides
149+
150+ Use `overrides` to customize the profiling job pod spec — for example to add tolerations for
151+ GPU node taints or inject environment variables.
152+
153+ **GPU node toleration** (common on GKE and shared clusters):
118154
119- autoApply : true
155+ ` ` ` yaml
156+ apiVersion: nvidia.com/v1beta1
157+ kind: DynamoGraphDeploymentRequest
158+ metadata:
159+ name: dense-with-tolerations
160+ spec:
161+ model: "Qwen/Qwen3-0.6B"
162+ image: "nvcr.io/nvidia/ai-dynamo/dynamo-frontend:1.0.0"
163+
164+ overrides:
165+ profilingJob:
166+ template:
167+ spec:
168+ containers: [] # required placeholder; leave empty to inherit defaults
169+ tolerations:
170+ - key: nvidia.com/gpu
171+ operator: Exists
172+ effect: NoSchedule
173+ ` ` `
174+
175+ **Override the generated DynamoGraphDeployment** (e.g., to use a custom worker image):
176+
177+ ` ` ` yaml
178+ spec:
179+ ...
180+ overrides:
181+ dgd:
182+ apiVersion: nvidia.com/v1alpha1
183+ kind: DynamoGraphDeployment
184+ spec:
185+ services:
186+ VllmWorker:
187+ extraEnvs:
188+ - name: CUSTOM_ENV
189+ value: "my-value"
120190` ` `
121191
122192# # SGLang Runtime Profiling
0 commit comments