This repository aims to deploy a Kubernetes cluster using Kubespray in a straightforward manner.
In addition, it provides access to GPU nodes using NVIDIA GPU Operator. Thus, there is no need to worry about installing the required NVIDIA drivers.
The following cluster deployment has been tested using:
- Kubespray v2.16.0 (which installs Kubernetes 1.20.7)
- Ansible 2.9
- NVIDIA GPU Operator 1.8.1
- Ubuntu LTS - 20.4.
In order to use a custom configuration the config.example folder has to be copied to a new one. From now on, this new folder should be the only one in which files are modified.
cp -rfp config.example configNOTE: ansible.cfg uses config as the default folder for the inventory file. Another different inventory´s location can be provided using
-iwith theansible-playbookcommand.
The configuration template contains the following files/folders:
inventory.iniwhere hosts must be added of removed depending on the desired cluster status.group_varsfolder which contains different variables files to apply to each group. By default, it only contains variables forallandk8s_clustergroups, but additional ones can be added according to the existing groups ininventory.ini.
A playbook called deploy-cluster.yml that is responsible for deploying a Kubernetes cluster is provided. This playbook uses Kubespray cluster.yml under the hood.
ansible-playbook --user <user> --become --become-user=root deploy-cluster.ymlThe rest of the cluster management in terms of adding and deleting nodes or resetting the cluster can be done directly using the playbooks provided by Kubespray in submodules/kubespray. More information can be found in Adding/replacing a node.
ansible-playbook --user <user> --become --become-user=root submodules/kubespray/scale.ymlansible-playbook --user <user> --become --become-user=root submodules/kubespray/remove-node.yml --extra-vars "node=nodename1,nodename2"Remove node configuration from inventory.ini file.
ansible-playbook --user <user> --become --become-user=root submodules/kubespray/reset.ymlTo be able to have GPU nodes on the cluster, the simplest way is to use NVIDIA GPU Operator on top of Kubernetes.
By default, the use of GPU nodes is disabled. To enabled it, nvidia_gpu_operator variable can be set to true in group_vars/k8s_cluster/k8s-cluster.yml. It will automatically install NVIDIA GPU Operator with the deployment playbook deploy-cluster.yml.
For a more manual management, a couple of simple playbooks are also provided:
add-nvidia-gpu-operator.ymlwhich usesnvidia-gpu-operatorrole.remove-nvidia-gpu-operator.ymlwhich removes operator´s pods.
GPU nodes are treated as normal CPU nodes, which allows to use the Add Node, Remove Node and Reset Cluster commands without any additional modification.
Node can be labeled to schedule desired pods. More information: Assign Pods to Nodes.
kubectl label nodes <node-with-gpu> accelerator=<nvidia-gpu-type>In order to use kubectl from a remote machine, control plane or load balancer floating IP can be added to supplementary_addresses_in_ssl_keys. This allows to add that IP to the certificate.
Finally, you should copy the .kube/config file from the control plane to the machine where you want to run kubectl and modify that particular file to replace 127.0.0.1 with the floating IP previously added to supplementary_addresses_in_ssl_keys.
NOTE:
kubectlexpects to read the configuration file from.kube/config