The GPU Scheduler is a Kubernetes extension designed to provide smart, atomic, and topology-aware GPU allocation for workloads. It addresses limitations in the default Kubernetes scheduler by offering fine-grained control over GPU resources, ensuring efficient utilization and preventing race conditions.
- Atomic Allocation: Uses Kubernetes Coordination Leases to lock GPUs, preventing double-booking and race conditions.
- Topology Awareness: Optimizes GPU assignment based on NVLink topology to maximize bandwidth for multi-GPU workloads.
- Fine-Grained Control: Supports requesting specific GPU counts, contiguous allocation policies, and specific GPU IDs.
- Three-Component Architecture:
- Scheduler Plugin: Integrates with the Kubernetes scheduling framework.
- Webhook: Injects
CUDA_VISIBLE_DEVICESenvironment variables into pods. - Agent: Runs on nodes to discover and report GPU health and topology.
The system consists of three main components:
- Scheduler Plugin: Filters and scores nodes based on GPU availability and topology. It reserves GPUs by creating Lease objects.
- Webhook (Mutator): Intercepts pod creation requests and injects the assigned GPU IDs into the container's environment variables.
- Agent (DaemonSet): Runs on each GPU node to discover hardware details (topology, health) and update
GpuNodeStatusresources.
For a deep dive into the architecture, see Architecture Guide.
The GPU Scheduler is designed to operate within high-performance on-premises ML clusters, where network topology and resource isolation are critical.
In this architecture:
- Control Plane (General Data Network): The Kubernetes control plane and standard workloads run on a general-purpose network (10/25/100G Ethernet). The GPU Scheduler runs here, orchestrating resources across the cluster.
- GPU Workers (GPU Fabric): GPU nodes are connected via a dedicated high-performance fabric (200/400G IB/RoCE) for model training traffic (NCCL).
- Topology Awareness: The scheduler optimizes allocations to ensure multi-GPU jobs run on nodes within the same NVLink domain or high-bandwidth switch group, minimizing latency on the GPU Fabric.
- Agent Role: The Agent runs on the GPU workers, discovering the local topology and reporting health status back to the scheduler on the Control Plane.
- Kubernetes cluster 1.33.x (verified)
- Nodes with NVIDIA GPUs
- NVIDIA device plugin installed
- Helm 3 installed
The admission webhook requires TLS certificates. You can generate them using the provided script:
curl -sL https://raw.githubusercontent.com/restack/gpu-scheduler/main/hack/gen-webhook-certs.sh | bash -s -- defaultFor more details, see Webhook Certificates Guide.
helm install gpu-scheduler ../charts/gpu-schedulerapiVersion: gpu.scheduling/v1
kind: GpuClaim
metadata:
name: single-gpu
spec:
devices:
count: 1
exclusivity: Exclusive
---
apiVersion: v1
kind: Pod
metadata:
name: gpu-test
annotations:
gpu.scheduling/claim: single-gpu
spec:
schedulerName: gpu-scheduler
containers:
- name: cuda-test
image: nvidia/cuda:12.4.1-runtime-ubuntu22.04
resources:
limits:
nvidia.com/gpu: "1"apiVersion: gpu.scheduling/v1
kind: GpuClaim
metadata:
name: multi-gpu
spec:
devices:
count: 4
policy: contiguous
exclusivity: Exclusive
topology:
mode: PreferredFor more examples and advanced usage, see Usage Guide.
- Go 1.24.x
- Docker
- Kind (Kubernetes in Docker)
# Build all images
make docker-all# Create cluster
kind create cluster --config ../hack/kind-cluster.yaml
# Build and load images
make docker-all
kind load docker-image ghcr.io/restack/gpu-scheduler:dev --name kind
kind load docker-image ghcr.io/restack/gpu-scheduler-webhook:dev --name kind
kind load docker-image ghcr.io/restack/gpu-scheduler-agent:dev --name kind
# Deploy
helm install gpu-scheduler ../charts/gpu-schedulerFor detailed development instructions, see Development Guide.

