Skip to content

Latest commit

 

History

History
144 lines (108 loc) · 2.7 KB

File metadata and controls

144 lines (108 loc) · 2.7 KB
title
Profiler Examples

Complete examples for profiling with DGDRs.

DGDR Examples

Dense Model: AIPerf on Real Engines

Standard online profiling with real GPU measurements:

apiVersion: nvidia.com/v1beta1
kind: DynamoGraphDeploymentRequest
metadata:
  name: vllm-dense-online
spec:
  model: "Qwen/Qwen3-0.6B"
  backend: vllm
  image: "nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.9.0"

  workload:
    isl: 3000
    osl: 150

  sla:
    ttft: 200.0
    itl: 20.0

  autoApply: true

Dense Model: AI Configurator Simulation

Fast offline profiling (~30 seconds, TensorRT-LLM only):

apiVersion: nvidia.com/v1beta1
kind: DynamoGraphDeploymentRequest
metadata:
  name: trtllm-aic-offline
spec:
  model: "Qwen/Qwen3-32B"
  backend: trtllm
  image: "nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:0.9.0"

  workload:
    isl: 4000
    osl: 500

  sla:
    ttft: 300.0
    itl: 10.0

  autoApply: true

MoE Model

Multi-node MoE profiling with SGLang:

apiVersion: nvidia.com/v1beta1
kind: DynamoGraphDeploymentRequest
metadata:
  name: sglang-moe
spec:
  model: "deepseek-ai/DeepSeek-R1"
  backend: sglang
  image: "nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.9.0"

  workload:
    isl: 2048
    osl: 512

  sla:
    ttft: 300.0
    itl: 25.0

  hardware:
    numGpusPerNode: 8

  autoApply: true

Using Existing DGD Config (ConfigMap)

Reference a custom DGD configuration via ConfigMap:

# Create ConfigMap from your DGD config file
kubectl create configmap deepseek-r1-config \
  --from-file=/path/to/your/disagg.yaml \
  --namespace $NAMESPACE \
  --dry-run=client -o yaml | kubectl apply -f -
apiVersion: nvidia.com/v1beta1
kind: DynamoGraphDeploymentRequest
metadata:
  name: deepseek-r1
spec:
  model: deepseek-ai/DeepSeek-R1
  backend: sglang
  image: "nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.9.0"

  workload:
    isl: 4000
    osl: 500

  sla:
    ttft: 300
    itl: 10

  autoApply: true

SGLang Runtime Profiling

Profile SGLang workers at runtime via HTTP endpoints:

# Start profiling
curl -X POST http://localhost:9090/engine/start_profile \
  -H "Content-Type: application/json" \
  -d '{"output_dir": "/tmp/profiler_output"}'

# Run inference requests to generate profiling data...

# Stop profiling
curl -X POST http://localhost:9090/engine/stop_profile

A test script is provided at examples/backends/sglang/test_sglang_profile.py:

python examples/backends/sglang/test_sglang_profile.py

View traces using Chrome's chrome://tracing, Perfetto UI, or TensorBoard.