GPU Scheduler for Kubernetes

The GPU Scheduler is a Kubernetes extension designed to provide smart, atomic, and topology-aware GPU allocation for workloads. It addresses limitations in the default Kubernetes scheduler by offering fine-grained control over GPU resources, ensuring efficient utilization and preventing race conditions.

Features

Atomic Allocation: Uses Kubernetes Coordination Leases to lock GPUs, preventing double-booking and race conditions.
Topology Awareness: Optimizes GPU assignment based on NVLink topology to maximize bandwidth for multi-GPU workloads.
Fine-Grained Control: Supports requesting specific GPU counts, contiguous allocation policies, and specific GPU IDs.
Three-Component Architecture:
- Scheduler Plugin: Integrates with the Kubernetes scheduling framework.
- Webhook: Injects CUDA_VISIBLE_DEVICES environment variables into pods.
- Agent: Runs on nodes to discover and report GPU health and topology.

Architecture

The system consists of three main components:

Scheduler Plugin: Filters and scores nodes based on GPU availability and topology. It reserves GPUs by creating Lease objects.
Webhook (Mutator): Intercepts pod creation requests and injects the assigned GPU IDs into the container's environment variables.
Agent (DaemonSet): Runs on each GPU node to discover hardware details (topology, health) and update GpuNodeStatus resources.

For a deep dive into the architecture, see Architecture Guide.

On-Premises ML Cluster Architecture

The GPU Scheduler is designed to operate within high-performance on-premises ML clusters, where network topology and resource isolation are critical.

In this architecture:

Control Plane (General Data Network): The Kubernetes control plane and standard workloads run on a general-purpose network (10/25/100G Ethernet). The GPU Scheduler runs here, orchestrating resources across the cluster.
GPU Workers (GPU Fabric): GPU nodes are connected via a dedicated high-performance fabric (200/400G IB/RoCE) for model training traffic (NCCL).
Topology Awareness: The scheduler optimizes allocations to ensure multi-GPU jobs run on nodes within the same NVLink domain or high-bandwidth switch group, minimizing latency on the GPU Fabric.
Agent Role: The Agent runs on the GPU workers, discovering the local topology and reporting health status back to the scheduler on the Control Plane.

Prerequisites

Kubernetes cluster 1.33.x (verified)
Nodes with NVIDIA GPUs
NVIDIA device plugin installed
Helm 3 installed

Installation

1. Setup Webhook Certificates

The admission webhook requires TLS certificates. You can generate them using the provided script:

curl -sL https://raw.githubusercontent.com/restack/gpu-scheduler/main/hack/gen-webhook-certs.sh | bash -s -- default

For more details, see Webhook Certificates Guide.

2. Install with Helm

helm install gpu-scheduler ../charts/gpu-scheduler

Usage

Single GPU Request

apiVersion: gpu.scheduling/v1
kind: GpuClaim
metadata:
  name: single-gpu
spec:
  devices:
    count: 1
    exclusivity: Exclusive
---
apiVersion: v1
kind: Pod
metadata:
  name: gpu-test
  annotations:
    gpu.scheduling/claim: single-gpu
spec:
  schedulerName: gpu-scheduler
  containers:
    - name: cuda-test
      image: nvidia/cuda:12.4.1-runtime-ubuntu22.04
      resources:
        limits:
          nvidia.com/gpu: "1"

Multi-GPU (Contiguous)

apiVersion: gpu.scheduling/v1
kind: GpuClaim
metadata:
  name: multi-gpu
spec:
  devices:
    count: 4
    policy: contiguous
    exclusivity: Exclusive
  topology:
    mode: Preferred

For more examples and advanced usage, see Usage Guide.

Development

Prerequisites

Go 1.24.x
Docker
Kind (Kubernetes in Docker)

Build

# Build all images
make docker-all

Local Development with Kind

# Create cluster
kind create cluster --config ../hack/kind-cluster.yaml

# Build and load images
make docker-all
kind load docker-image ghcr.io/restack/gpu-scheduler:dev --name kind
kind load docker-image ghcr.io/restack/gpu-scheduler-webhook:dev --name kind
kind load docker-image ghcr.io/restack/gpu-scheduler-agent:dev --name kind

# Deploy

helm install gpu-scheduler ../charts/gpu-scheduler

For detailed development instructions, see Development Guide.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU Scheduler for Kubernetes

Features

Architecture

On-Premises ML Cluster Architecture

Prerequisites

Installation

1. Setup Webhook Certificates

2. Install with Helm

Usage

Single GPU Request

Multi-GPU (Contiguous)

Development

Prerequisites

Build

Local Development with Kind

Documentation

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

GPU Scheduler for Kubernetes

Features

Architecture

On-Premises ML Cluster Architecture

Prerequisites

Installation

1. Setup Webhook Certificates

2. Install with Helm

Usage

Single GPU Request

Multi-GPU (Contiguous)

Development

Prerequisites

Build

Local Development with Kind

Documentation