Skip to content

Latest commit

 

History

History
354 lines (252 loc) · 9.3 KB

File metadata and controls

354 lines (252 loc) · 9.3 KB

Local Development with Kind Emulator

Quick start guide for local development using Kind (Kubernetes in Docker) with emulated GPU resources.

Note: This guide covers Kind-specific deployment for local testing. For a complete overview of deployment methods, Helm chart configuration, and the full configuration reference, see the main deployment guide.

Table of Contents

Prerequisites

  • Docker
  • Kind
  • kubectl
  • Helm

Quick Start

One-Command Setup

Deploy WVA with full llm-d infrastructure:

# From project root
make deploy-wva-emulated-on-kind

This creates:

  • Kind cluster with 3 nodes, emulated GPUs (mixed vendors)
  • WVA controller
  • llm-d infrastructure (simulation mode)
  • Prometheus monitoring
  • vLLM emulator

Configuration Options

For a complete list of environment variables and configuration options, see the Configuration Reference in the main deployment guide.

Key environment variables for Kind emulator:

export HF_TOKEN="hf_xxxxx"                  # Required: HuggingFace token
export MODEL_ID="unsloth/Meta-Llama-3.1-8B" # Model to deploy
export ACCELERATOR_TYPE="H100"              # Emulated GPU type
export GATEWAY_PROVIDER="kgateway"          # Gateway for Kind (kgateway recommended)
export ITL_AVERAGE_LATENCY_MS=20            # Average inter-token latency for the llm-d-inference-sim
export TTFT_AVERAGE_LATENCY_MS=200          # Average time-to-first-token for the llm-d-inference-sim

# Performance tuning (optional)
export VLLM_MAX_NUM_SEQS=64                 # vLLM max concurrent sequences (batch size)
export HPA_STABILIZATION_SECONDS=240        # HPA stabilization window

# Image load (optional; auto-detected if unset)
export KIND_IMAGE_PLATFORM=linux/amd64      # Single platform for kind load (avoids "digest not found")

Deployment flags:

export DEPLOY_PROMETHEUS=true         # Deploy Prometheus stack
export DEPLOY_WVA=true                # Deploy WVA controller
export DEPLOY_LLM_D=true              # Deploy llm-d infrastructure (emulated)
export DEPLOY_PROMETHEUS_ADAPTER=true # Deploy Prometheus Adapter
export DEPLOY_HPA=true                # Deploy HPA

Step-by-Step Setup

1. Create Kind cluster:

make create-kind-cluster

# With custom configuration
make create-kind-cluster KIND_ARGS="-t mix -n 4 -g 2"
# -t: vendor type (nvidia, amd, intel, mix)
# -n: number of nodes
# -g: GPUs per node

2. Deploy WVA only:

export DEPLOY_WVA=true
export DEPLOY_LLM_D=false
export DEPLOY_PROMETHEUS=true # Prometheus is needed for WVA to scrape metrics
export VLLM_SVC_ENABLED=true
export DEPLOY_PROMETHEUS_ADAPTER=false
export DEPLOY_HPA=false
make deploy-wva-emulated-on-kind

3. Deploy with llm-d (by default):

make deploy-wva-emulated-on-kind

4. Testing configuration with fast saturation:

export VLLM_MAX_NUM_SEQS=8              # Low batch size for easy saturation
export HPA_STABILIZATION_SECONDS=30     # Fast scaling for testing
make deploy-wva-emulated-on-kind

Scripts

setup.sh

Creates Kind cluster with emulated GPU support.

./setup.sh -t mix -n 3 -g 2

Options:

  • -t: Vendor type (nvidia|amd|intel|mix) - default: mix
  • -n: Number of nodes - default: 3
  • -g: GPUs per node - default: 2

teardown.sh

Destroys the Kind cluster.

./teardown.sh

Cluster Configuration

Default cluster created by setup.sh:

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
  - role: control-plane
    extraMounts:
      - hostPath: /dev/null
        containerPath: /dev/nvidia0
  - role: worker
  - role: worker

GPUs are emulated using extended resources:

  • nvidia.com/gpu
  • amd.com/gpu
  • intel.com/gpu

Testing Locally

1. Access metrics, Services and Pods

Port-forward WVA metrics:

kubectl port-forward -n workload-variant-autoscaler-system \
  svc/workload-variant-autoscaler-controller-manager-metrics 8080:8080

Port-forward Prometheus:

kubectl port-forward -n workload-variant-autoscaler-monitoring \
  svc/prometheus-operated 9090:9090

Port-forward Inference Gateway:

kubectl port-forward -n llm-d-sim svc/infra-sim-inference-gateway 8000:80

2. Create Test Resources

# Apply sample VariantAutoscaling
kubectl apply -f ../../config/samples/

3. Generate Load

Option A — Run E2E tests (recommended)
The e2e suite deploys infra, creates resources, generates load, and validates scaling. No manual load tool needed.

# From repo root, after deploying (e.g. make deploy-wva-emulated-on-kind)
make deploy-e2e-infra   # if not already done
make test-e2e-smoke    # quick validation
# or
make test-e2e-full     # full suite including saturation scaling

See Testing Guide and E2E Test Suite README.

Option B — Manual load with burst script
Use the script in the e2e fixtures (requires only curl; no Python). After port-forwarding the inference gateway or vLLM service to localhost:8000:

# From repo root
export TARGET_URL="http://localhost:8000/v1/chat/completions"
export MODEL_ID="unsloth/Meta-Llama-3.1-8B"
export TOTAL_REQUESTS=100
export BATCH_SIZE=10
./hack/burst_load_generator.sh

Tune load with TOTAL_REQUESTS, BATCH_SIZE, and optional BATCH_SLEEP, MAX_TOKENS, CURL_TIMEOUT (see script header).

4. Monitor

# Watch deployments scale
watch kubectl get deploy -n llm-d-sim

# Watch VariantAutoscaling status
watch kubectl get variantautoscalings.llmd.ai -A

# View controller logs
kubectl logs -n workload-variant-autoscaler-system \
  -l control-plane=controller-manager -f

Troubleshooting

Cluster Creation Fails

# Clean up and retry
kind delete cluster --name kind-wva-gpu-cluster
make create-kind-cluster

Controller Not Starting

# Check controller logs
kubectl logs -n workload-variant-autoscaler-system \
  deployment/workload-variant-autoscaler-controller-manager

# Verify CRDs installed
kubectl get crd variantautoscalings.llmd.ai

# Check RBAC
kubectl get clusterrole,clusterrolebinding -l app=workload-variant-autoscaler

GPUs Not Appearing

# Verify GPU labels on nodes
kubectl get nodes -o json | jq '.items[].status.capacity'

# Should see nvidia.com/gpu, amd.com/gpu, or intel.com/gpu

Port-Forward Issues

# Kill existing port-forwards
pkill -f "kubectl port-forward"

# Verify pod is running before port-forwarding
kubectl get pods -n <namespace>

kind load fails with "content digest ... not found"

This can happen when loading a multi-platform image into Kind: the image manifest references blobs for multiple platforms (e.g. linux/arm64, linux/amd64), but the stream that kind load feeds into containerd does not include all of them, so ctr reports a missing digest. See kubernetes-sigs/kind#3795 and kubernetes-sigs/kind#3845. The install script works around it by pulling a single-platform image before loading. If you still see the error or need a specific architecture, set the platform explicitly:

# Force linux/amd64 (e.g. for Intel or emulated nodes)
KIND_IMAGE_PLATFORM=linux/amd64 make deploy-wva-emulated-on-kind CREATE_CLUSTER=true DEPLOY_LLM_D=true

# Force linux/arm64 (e.g. for Apple Silicon with native arm64 nodes)
KIND_IMAGE_PLATFORM=linux/arm64 make deploy-wva-emulated-on-kind CREATE_CLUSTER=true DEPLOY_LLM_D=true

Alternatively, build the image locally and deploy with IfNotPresent so the script skips the registry pull and loads your local single-platform image:

make docker-build IMG=ghcr.io/llm-d/llm-d-workload-variant-autoscaler:latest
WVA_IMAGE_PULL_POLICY=IfNotPresent make deploy-wva-emulated-on-kind CREATE_CLUSTER=true DEPLOY_LLM_D=true

Development Workflow

  1. Make code changes

  2. Build new image:

    make docker-build IMG=localhost:5000/wva:dev
  3. Load image to Kind:

    kind load docker-image localhost:5000/wva:dev --name kind-inferno-gpu-cluster
  4. Update deployment:

    kubectl set image deployment/workload-variant-autoscaler-controller-manager \
      -n workload-variant-autoscaler-system \
      manager=localhost:5000/wva:dev
  5. Verify changes:

    kubectl logs -n workload-variant-autoscaler-system \
      deployment/workload-variant-autoscaler-controller-manager -f

Clean Up

Remove deployments:

make undeploy-wva-emulated-on-kind

Remove deployments and delete the Kind cluster:

make undeploy-wva-emulated-on-kind-delete-cluster

Destroy cluster:

make destroy-kind-cluster

Next Steps