Installation link: https://developer.nvidia.com/cuda-12-8-0-download-archive
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-ubuntu2404.pin
sudo mv cuda-ubuntu2404.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.8.0/local_installers/cuda-repo-ubuntu2404-12-8-local_12.8.0-570.86.10-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2404-12-8-local_12.8.0-570.86.10-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2404-12-8-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda-toolkit-12-8
sudo apt-get install -y cuda-drivers
CUDA_PATH=/usr/local/cuda
CUDA_PATH_LINE="export PATH=\$CUDA_PATH/bin:\$PATH"
CUDA_LD_LIBRARY_PATH_LINE="export LD_LIBRARY_PATH=\$CUDA_PATH/lib64:\$LD_LIBRARY_PATH"
echo "CUDA_PATH=${CUDA_PATH}" >> ~/.bashrc
echo "$CUDA_PATH_LINE" >> ~/.bashrc
echo "$CUDA_LD_LIBRARY_PATH_LINE" >> ~/.bashrc
source ~/.bashrc
Source: Step-by-Step Guide to Creating a Kubernetes Cluster on Ubuntu 22.04 Using Containerd Runtime
| CP | apply on control-plane node only |
|---|---|
| W | apply on worker node only |
| CP-W | apply on both control-plane and worker node |
sudo ufw disable
Verify that status with:
sudo ufw status
sudo apt update
sudo apt -y full-upgrade
sudo apt install systemd-timesyncd
sudo timedatectl set-ntp true
Check status with
sudo timedatectl status
and verify that NTP services is marked as active: "NTP service: active"
sudo swapoff -a
sudo sed -i.bak -r 's/(.+ swap .+)/#\1/' /etc/fstab
Check the status with the free -m command.
free -m
Swap values should have been set to 0.
Check the fstab file as well. Otherwise swap will be turned on automatically on reboot.
cat /etc/fstab | grep swap
The 'swapfile' entry should be commented out.
sudo nano /etc/modules-load.d/k8s.conf
Add the following content to the file, save and close it
overlay
br_netfilter
Load the modules above into the current session
sudo modprobe overlay
sudo modprobe br_netfilter
Check the status
lsmod | grep "overlay\|br_netfilter"
sudo nano /etc/sysctl.d/k8s.conf
Add the following content to the file, save and close it
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
Apply the newly added network params
sudo sysctl --system
sudo apt-get install -y apt-transport-https ca-certificates curl gpg gnupg2 software-properties-common
Now both nodes are ready to install the Kubernetes tools and runtime.
First, check whether the /etc/apt/keyrings directory is present on your nodes. If not, create the directory using the command below
sudo mkdir -m 755 /etc/apt/keyrings
curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.32/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.32/deb/ /' | sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo apt update
sudo apt-get install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg
echo "deb [arch="$(dpkg --print-architecture)" signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu "$(. /etc/os-release && echo "$VERSION_CODENAME")" stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt install -y containerd.io
sudo mkdir -p /etc/containerd
Generate the default config toml file
sudo containerd config default|sudo tee /etc/containerd/config.toml
Open the generated file in any text editor and verify the following settings:
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v2" # <- note that this line might have been missed
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = true # <- note that this could be set as false in the default configuration, please set it to true
Additionally, at the beginning of this config file, you should see the "disabled_plugins" set as an empty list ([]). Please verify that it is indeed empty.
Now save the config file, close it and restart the containerd service
sudo systemctl restart containerd
sudo systemctl enable containerd
systemctl status containerd
sudo apt install cri-tools
sudo nano /etc/crictl.yaml
Add the follwing content to the file, save and exit
runtime-endpoint: unix:///run/containerd/containerd.sock
image-endpoint: unix:///run/containerd/containerd.sock
timeout: 2
debug: true
pull-image-on-create: false
sudo systemctl enable kubelet
sudo kubeadm config images pull --cri-socket unix:///var/run/containerd/containerd.sock
sudo kubeadm init \
--pod-network-cidr=10.244.0.0/16 \
--cri-socket unix:///var/run/containerd/containerd.sock \
--v=5
Do not skip this step!
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Optional: Remove taints from the node
kubectl taint nodes --all node-role.kubernetes.io/control-plane-
This enables us to schedule workloads on the node, even though it's part of the control-plane.
kubectl apply -f infrastructure/install/kube-flannel.yaml
kubeadm token create --print-join-command
kubectl get nodes -o wide
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
This must be done on all GPU nodes in the cluster!
sudo nvidia-ctk runtime configure --runtime=containerd --set-as-default
sudo systemctl restart containerd
kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.17.0/deployments/static/nvidia-device-plugin.yml
kubectl apply -f infrastructure/install/gpu-pod.yaml
kubectl logs gpu-pod
Expected result:
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done
curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3
chmod 700 get_helm.sh
./get_helm.sh
helm repo add kedacore https://kedacore.github.io/charts
helm repo update
helm install keda kedacore/keda --namespace keda --create-namespace