This repository contains automated setup scripts for configuring Debian 12 servers with NVIDIA GPU support for Kubernetes clusters.
- Debian 12 (Bookworm) fresh installation
- NVIDIA GPU hardware
- Root or sudo access
- Internet connectivity
- NVIDIA Drivers - Latest proprietary NVIDIA drivers from Debian repositories
- containerd - Container runtime configured for Kubernetes
- NVIDIA Container Toolkit - GPU support for containers
- Kubernetes Tools - kubectl, kubeadm, and kubelet (v1.31)
# Clone the repository
git clone https://github.com/tzhukov/k8s_gpu_nvidia_debian.git
cd k8s_gpu_nvidia_debian
# Make scripts executable
chmod +x *.sh
# Run complete setup (requires root)
sudo ./install-all.sh
# Reboot the system
sudo rebootIf you prefer to run scripts step by step or only need specific components:
# 1. Install NVIDIA Drivers
sudo ./setup-nvidia-drivers.sh
# 2. Setup containerd
sudo ./setup-containerd.sh
# 3. Install NVIDIA Container Toolkit
sudo ./setup-nvidia-container-toolkit.sh
# 4. Install Kubernetes packages
sudo ./setup-kubernetes.sh
# Reboot after all installations
sudo rebootAfter rebooting, verify your installation:
nvidia-smisudo systemctl status containerdkubectl version --client
kubeadm version
kubelet --versionsudo ctr image pull docker.io/nvidia/cuda:12.3.0-base-ubuntu22.04
sudo ctr run --rm --gpus 0 docker.io/nvidia/cuda:12.3.0-base-ubuntu22.04 test nvidia-smi# Initialize the cluster (adjust pod network CIDR as needed)
sudo kubeadm init --pod-network-cidr=10.244.0.0/16
# Configure kubectl for your user
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
# Install a CNI plugin (example: Calico)
kubectl apply -f https://docs.projectcalico.org/manifests/calico.yamlOn worker nodes, use the join command from the master node output:
sudo kubeadm join <master-ip>:6443 --token <token> --discovery-token-ca-cert-hash sha256:<hash>Install the NVIDIA GPU device plugin:
kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.14.5/nvidia-device-plugin.ymlVerify GPU nodes:
kubectl get nodes "-o=custom-columns=NAME:.metadata.name,GPU:.status.allocatable.nvidia\.com/gpu"Create a test pod that uses GPU:
apiVersion: v1
kind: Pod
metadata:
name: gpu-test
spec:
containers:
- name: cuda-container
image: nvidia/cuda:12.3.0-base-ubuntu22.04
command: ["nvidia-smi"]
resources:
limits:
nvidia.com/gpu: 1
restartPolicy: NeverApply and check:
kubectl apply -f gpu-test.yaml
kubectl logs gpu-test# Check if driver is loaded
lsmod | grep nvidia
# Reinstall if needed
sudo apt-get install --reinstall nvidia-driver# Check containerd logs
sudo journalctl -u containerd -f
# Restart containerd
sudo systemctl restart containerd# Check kubelet status
sudo systemctl status kubelet
# View kubelet logs
sudo journalctl -u kubelet -f
# Reset cluster (WARNING: destroys cluster)
sudo kubeadm resetMaster script that runs all setup scripts in the correct order.
- Enables non-free repositories
- Installs kernel headers and DKMS
- Installs NVIDIA proprietary drivers
- Loads required kernel modules
- Configures sysctl parameters
- Installs and configures containerd with systemd cgroup driver
- Adds NVIDIA Container Toolkit repository
- Installs nvidia-container-toolkit
- Configures containerd for NVIDIA runtime
- Disables swap
- Adds Kubernetes repository (v1.31)
- Installs kubectl, kubeadm, and kubelet
- Holds packages to prevent automatic updates
- OS: Debian 12 (Bookworm)
- GPU: NVIDIA GPU with driver support
- Memory: Minimum 2GB RAM (4GB+ recommended)
- CPU: 2+ cores recommended
- Disk: 20GB+ available space
- These scripts disable swap and modify kernel parameters
- NVIDIA proprietary drivers are installed
- Kubernetes packages are held at specific versions
- Always review scripts before running with root privileges
Contributions are welcome! Please feel free to submit issues or pull requests.
MIT License - See LICENSE file for details