Quick start guide for local development using Kind (Kubernetes in Docker) with emulated GPU resources.
- Docker
- Kind
- kubectl
- Helm (optional, but recommended)
Deploy WVA with full llm-d infrastructure:
# From project root
make deploy-llm-d-inferno-emulated-on-kindThis creates:
- Kind cluster with 3 nodes, emulated GPUs (mixed vendors)
- WVA controller
- llm-d infrastructure (simulation mode)
- Prometheus monitoring
- vLLM emulator
1. Create Kind cluster:
make create-kind-cluster
# With custom configuration
make create-kind-cluster KIND_ARGS="-t mix -n 4 -g 2"
# -t: vendor type (nvidia, amd, intel, mix)
# -n: number of nodes
# -g: GPUs per node2. Deploy WVA only:
make deploy-wva-emulated-on-kind3. Deploy with llm-d:
make deploy-llm-d-wva-emulated-on-kindCreates Kind cluster with emulated GPU support.
./setup.sh -t mix -n 3 -g 2Options:
-t: Vendor type (nvidia|amd|intel|mix) - default: mix-n: Number of nodes - default: 3-g: GPUs per node - default: 2
Deploys WVA controller to existing cluster.
./deploy-wva.shDeploys WVA with llm-d infrastructure.
./deploy-llm-d.sh -i <your-image>Removes WVA and llm-d infrastructure.
./undeploy-llm-d.shDestroys the Kind cluster.
./teardown.shDefault cluster created by setup.sh:
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
extraMounts:
- hostPath: /dev/null
containerPath: /dev/nvidia0
- role: worker
- role: workerGPUs are emulated using extended resources:
nvidia.com/gpuamd.com/gpuintel.com/gpu
Port-forward WVA metrics:
kubectl port-forward -n workload-variant-autoscaler-system \
svc/workload-variant-autoscaler-controller-manager-metrics 8080:8080Port-forward Prometheus:
kubectl port-forward -n workload-variant-autoscaler-monitoring \
svc/prometheus-operated 9090:9090Port-forward vLLM emulator:
kubectl port-forward -n llm-d-sim svc/vllme-service 8000:80Port-forward Inference Gateway:
kubectl port-forward -n llm-d-sim svc/infra-sim-inference-gateway 8000:80# Apply sample VariantAutoscaling
kubectl apply -f ../../config/samples/cd ../../tools/vllm-emulator
# Install dependencies
pip install -r requirements.txt
# Run load generator
python loadgen.py \
--model default/default \
--rate '[[120, 60]]' \
--url http://localhost:8000/v1 \
--content 50# Watch deployments scale
watch kubectl get deploy -n llm-d-sim
# Watch VariantAutoscaling status
watch kubectl get variantautoscalings.llmd.ai -A
# View controller logs
kubectl logs -n workload-variant-autoscaler-system \
-l control-plane=controller-manager -f# Clean up and retry
kind delete cluster --name kind-inferno-gpu-cluster
make create-kind-cluster# Check controller logs
kubectl logs -n workload-variant-autoscaler-system \
deployment/workload-variant-autoscaler-controller-manager
# Verify CRDs installed
kubectl get crd variantautoscalings.llmd.ai
# Check RBAC
kubectl get clusterrole,clusterrolebinding -l app=workload-variant-autoscaler# Verify GPU labels on nodes
kubectl get nodes -o json | jq '.items[].status.capacity'
# Should see nvidia.com/gpu, amd.com/gpu, or intel.com/gpu# Kill existing port-forwards
pkill -f "kubectl port-forward"
# Verify pod is running before port-forwarding
kubectl get pods -n <namespace>- Make code changes
- Build new image:
make docker-build IMG=localhost:5000/wva:dev
- Load image to Kind:
kind load docker-image localhost:5000/wva:dev --name kind-inferno-gpu-cluster
- Update deployment:
kubectl set image deployment/workload-variant-autoscaler-controller-manager \ -n workload-variant-autoscaler-system \ manager=localhost:5000/wva:dev - Verify changes:
kubectl logs -n workload-variant-autoscaler-system \ deployment/workload-variant-autoscaler-controller-manager -f
Remove deployments:
make undeploy-llm-d-inferno-emulated-on-kindDestroy cluster:
make destroy-kind-clusterOr use scripts directly:
./undeploy-llm-d.sh
./teardown.sh