| title |
|---|
Profiler Examples |
Complete examples for profiling with DGDRs.
Standard online profiling with real GPU measurements:
apiVersion: nvidia.com/v1beta1
kind: DynamoGraphDeploymentRequest
metadata:
name: vllm-dense-online
spec:
model: "Qwen/Qwen3-0.6B"
backend: vllm
image: "nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.9.0"
workload:
isl: 3000
osl: 150
sla:
ttft: 200.0
itl: 20.0
autoApply: trueFast offline profiling (~30 seconds, TensorRT-LLM only):
apiVersion: nvidia.com/v1beta1
kind: DynamoGraphDeploymentRequest
metadata:
name: trtllm-aic-offline
spec:
model: "Qwen/Qwen3-32B"
backend: trtllm
image: "nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:0.9.0"
workload:
isl: 4000
osl: 500
sla:
ttft: 300.0
itl: 10.0
autoApply: trueMulti-node MoE profiling with SGLang:
apiVersion: nvidia.com/v1beta1
kind: DynamoGraphDeploymentRequest
metadata:
name: sglang-moe
spec:
model: "deepseek-ai/DeepSeek-R1"
backend: sglang
image: "nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.9.0"
workload:
isl: 2048
osl: 512
sla:
ttft: 300.0
itl: 25.0
hardware:
numGpusPerNode: 8
autoApply: trueReference a custom DGD configuration via ConfigMap:
# Create ConfigMap from your DGD config file
kubectl create configmap deepseek-r1-config \
--from-file=/path/to/your/disagg.yaml \
--namespace $NAMESPACE \
--dry-run=client -o yaml | kubectl apply -f -apiVersion: nvidia.com/v1beta1
kind: DynamoGraphDeploymentRequest
metadata:
name: deepseek-r1
spec:
model: deepseek-ai/DeepSeek-R1
backend: sglang
image: "nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.9.0"
workload:
isl: 4000
osl: 500
sla:
ttft: 300
itl: 10
autoApply: trueProfile SGLang workers at runtime via HTTP endpoints:
# Start profiling
curl -X POST http://localhost:9090/engine/start_profile \
-H "Content-Type: application/json" \
-d '{"output_dir": "/tmp/profiler_output"}'
# Run inference requests to generate profiling data...
# Stop profiling
curl -X POST http://localhost:9090/engine/stop_profileA test script is provided at examples/backends/sglang/test_sglang_profile.py:
python examples/backends/sglang/test_sglang_profile.pyView traces using Chrome's chrome://tracing, Perfetto UI, or TensorBoard.