Skip to content
Merged
Show file tree
Hide file tree
Changes from 76 commits
Commits
Show all changes
77 commits
Select commit Hold shift + click to select a range
ab65245
added k3s installation to bootstrap
wtripp180901 Sep 10, 2024
321fdb4
added k3ds token to terraform
wtripp180901 Sep 10, 2024
915c4dd
added role to install playbooks for ansible-init
wtripp180901 Sep 11, 2024
824e117
Refactored so that agent or server is determined by metadata
wtripp180901 Sep 12, 2024
b49b22c
Added (very hacky) k3s token generation
wtripp180901 Sep 12, 2024
99c0028
Added k9s role
wtripp180901 Sep 13, 2024
68feb76
Moved k3s install to after network setup
wtripp180901 Sep 13, 2024
90c7a78
Added seperate k3s group
wtripp180901 Sep 16, 2024
cbcf762
Added helm
wtripp180901 Sep 16, 2024
370b188
Fixed ansible-init sentinel being created in packer build
wtripp180901 Sep 17, 2024
df02a66
Moved helm install
wtripp180901 Sep 17, 2024
327c645
Added kube roles to gitignore
wtripp180901 Sep 17, 2024
ce82f59
moved installs to usr/bin
wtripp180901 Sep 17, 2024
250b4c7
remove local DNS as a dependency for k3s
sjpb Sep 19, 2024
2f26fa1
agent/server config now based on if server name defined
wtripp180901 Sep 19, 2024
a8d4e17
k3s token now templated into terraform vars
wtripp180901 Sep 19, 2024
510115f
Name and label suggestions from review
wtripp180901 Sep 19, 2024
56c0d67
Refactor + group changes
wtripp180901 Sep 19, 2024
6d6bd2d
Refactored k9s install
wtripp180901 Sep 19, 2024
2b4f1f6
Removed server from control terraform and changed ansible-init file t…
wtripp180901 Sep 19, 2024
132e49e
Fixed merge conflicts
wtripp180901 Sep 20, 2024
1952797
more merge conflicts
wtripp180901 Sep 20, 2024
c642866
name update
wtripp180901 Sep 20, 2024
0975257
Updated .stackhpc env with k3s token
wtripp180901 Sep 20, 2024
22a4e6d
Merge branch 'main' into feature/k3s-ansible-init
wtripp180901 Sep 20, 2024
79383fe
added k3s readme
wtripp180901 Sep 20, 2024
fa955dd
bump images
wtripp180901 Sep 20, 2024
a8569da
Disabled traefik for non-server nodes
wtripp180901 Sep 30, 2024
8250483
Revert images for clean build
wtripp180901 Sep 30, 2024
46401bf
bump images
wtripp180901 Oct 1, 2024
15d3514
Apply suggestions from code review
wtripp180901 Oct 2, 2024
bc37064
Code review tweaks
wtripp180901 Oct 2, 2024
2ab8e52
Moved k3s token to be with rest of appliance secrets
wtripp180901 Oct 2, 2024
26a0f89
bump images
wtripp180901 Oct 2, 2024
8f0923c
Merge branch 'main' into feature/k3s-ansible-init
wtripp180901 Oct 3, 2024
cfb5514
updated caas for k3s
wtripp180901 Oct 4, 2024
8b7941d
fixed k3s install overwriting ansible-init changes
wtripp180901 Oct 4, 2024
5dfec0d
bump images
wtripp180901 Oct 4, 2024
1035460
removed k3s ingress
wtripp180901 Oct 8, 2024
7eb4821
added k3s docs
wtripp180901 Oct 8, 2024
9f00f11
Merge branch 'feature/k3s-ansible-init' of github.com:stackhpc/ansibl…
wtripp180901 Oct 8, 2024
2861edb
merge conflict fixes
wtripp180901 Oct 8, 2024
e6dd871
bump images
wtripp180901 Oct 9, 2024
7b3b115
Fixed node passwords changing on reimage
wtripp180901 Oct 10, 2024
d95037b
Merge branch 'feature/k3s-ansible-init' of github.com:stackhpc/ansibl…
wtripp180901 Oct 10, 2024
a0d947b
fixed missing directory
wtripp180901 Oct 11, 2024
1d1e777
typo
wtripp180901 Oct 11, 2024
3be011c
Merge branch 'main' into feature/k3s-ansible-init
wtripp180901 Oct 11, 2024
440f20c
moved CI image definition
wtripp180901 Oct 11, 2024
d69033a
bump images
wtripp180901 Oct 11, 2024
3d2e2cd
added cuda image for ci
wtripp180901 Oct 11, 2024
2efa193
typo
wtripp180901 Oct 11, 2024
ba1d212
corrected docs
wtripp180901 Oct 14, 2024
904df8a
k3s install now air-gapped
wtripp180901 Oct 22, 2024
a67ffd3
bump images
wtripp180901 Oct 23, 2024
04db97d
merge from main
wtripp180901 Oct 25, 2024
7008500
ci images bumped up to date with main
wtripp180901 Oct 25, 2024
79433b8
fixed k3s token idempotency issues
wtripp180901 Nov 4, 2024
bba95bb
Comment + doc changes from review
wtripp180901 Nov 4, 2024
3f599c6
play rename
wtripp180901 Nov 4, 2024
a6f0137
removed sentinel cleanup
wtripp180901 Nov 4, 2024
03fe568
k3s role refactor
wtripp180901 Nov 4, 2024
d96eddd
updated k3s docs
wtripp180901 Nov 4, 2024
21b7081
Merge branch 'feature/k3s-ansible-init' of github.com:stackhpc/ansibl…
wtripp180901 Nov 4, 2024
4f45701
merge conflicts
wtripp180901 Nov 4, 2024
bf47035
bumped images up to date with main
wtripp180901 Nov 4, 2024
ad84877
fixed k3s token generation
wtripp180901 Nov 4, 2024
54910a1
Merge branch 'feature/k3s-ansible-init' of github.com:stackhpc/ansibl…
wtripp180901 Nov 4, 2024
8ab12e9
Passwords role now reads variables into top level vars
wtripp180901 Nov 12, 2024
bf16547
moved k3s plays to install script
wtripp180901 Nov 12, 2024
5e3927b
reverted caas changes
wtripp180901 Nov 12, 2024
98f5b79
Merge branch 'main' into feature/k3s-ansible-init
wtripp180901 Nov 12, 2024
20a8a62
re-enabled caas access_network
wtripp180901 Nov 12, 2024
5b43d0e
bump images
wtripp180901 Nov 13, 2024
4538c6d
merge
wtripp180901 Nov 15, 2024
100632f
bump
wtripp180901 Nov 18, 2024
0c17410
k9s tags and variable renames
wtripp180901 Nov 19, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions ansible/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -58,5 +58,9 @@ roles/*
!roles/squid/**
!roles/tuned/
!roles/tuned/**
!roles/k3s/
!roles/k3s/**
!roles/k9s/
!roles/k9s/**
!roles/lustre/
!roles/lustre/**
8 changes: 8 additions & 0 deletions ansible/bootstrap.yml
Original file line number Diff line number Diff line change
Expand Up @@ -259,3 +259,11 @@
tasks:
- include_role:
name: azimuth_cloud.image_utils.linux_ansible_init

- hosts: k3s
become: yes
tags: k3s
tasks:
- ansible.builtin.include_role:
name: k3s
tasks_from: install.yml
2 changes: 1 addition & 1 deletion ansible/cleanup.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@

- name: Cleanup /tmp
command : rm -rf /tmp/*

- name: Get package facts
package_facts:

Expand Down
7 changes: 7 additions & 0 deletions ansible/extras.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,3 +36,10 @@
tasks:
- import_role:
name: persist_hostkeys

- name: Install k9s
become: yes
hosts: k9s
tasks:
- import_role:
name: k9s
19 changes: 19 additions & 0 deletions ansible/roles/cluster_infra/templates/resources.tf.j2
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,19 @@ data "openstack_identity_auth_scope_v3" "scope" {
name = "{{ cluster_name }}"
}

####
#### Data resources
####

resource "terraform_data" "k3s_token" {
input = "{{ k3s_token }}"
lifecycle {
ignore_changes = [
input, # makes it a write-once value (set via Ansible)
]
}
}

#####
##### Security groups for the cluster
#####
Expand Down Expand Up @@ -386,6 +399,8 @@ resource "openstack_compute_instance_v2" "login" {
ansible_init_coll_{{ loop.index0 }}_source = "{{ collection.source }}"
{% endif %}
{% endfor %}
k3s_server = openstack_compute_instance_v2.control.network[0].fixed_ip_v4
k3s_token = "{{ k3s_token }}"
}
}

Expand All @@ -400,6 +415,7 @@ resource "openstack_compute_instance_v2" "control" {

network {
port = openstack_networking_port_v2.control.id
access_network = true
}

{% if cluster_storage_network is defined %}
Expand Down Expand Up @@ -479,6 +495,7 @@ resource "openstack_compute_instance_v2" "control" {
ansible_init_coll_{{ loop.index0 }}_source = "{{ collection.source }}"
{% endif %}
{% endfor %}
k3s_token = "{{ k3s_token }}"
}
}

Expand Down Expand Up @@ -548,6 +565,8 @@ resource "openstack_compute_instance_v2" "{{ partition.name }}" {
ansible_init_coll_{{ loop.index0 }}_source = "{{ collection.source }}"
{% endif %}
{% endfor %}
k3s_server = openstack_compute_instance_v2.control.network[0].fixed_ip_v4
k3s_token = "{{ k3s_token }}"
}
}

Expand Down
16 changes: 16 additions & 0 deletions ansible/roles/k3s/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
k3s
=====

Installs k3s agent and server services on nodes and an ansible-init playbook to activate them. The service that each node will activate on init is determined by OpenStack metadata. Also includes Helm install. Currently only supports a single k3s-server
(i.e one control node). Install based on the [official k3s ansible role](https://github.com/k3s-io/k3s-ansible).


Requirements
------------

`azimuth_cloud.image_utils.linux_ansible_init` must have been run previously on targeted nodes during image build.

Role Variables
--------------

- `k3s_version`: Optional str. K3s version to install, see [official releases](https://github.com/k3s-io/k3s/releases/).
5 changes: 5 additions & 0 deletions ansible/roles/k3s/defaults/main.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Warning: changes to these variables won't be reflected in the cluster/image if k3s is already installed
k3s_version: "v1.31.0+k3s1"
k3s_selinux_release: v1.6.latest.1
k3s_selinux_rpm_version: 1.6-1
k3s_helm_version: v3.11.0
36 changes: 36 additions & 0 deletions ansible/roles/k3s/files/start_k3s.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
- hosts: localhost
become: true
vars:
os_metadata: "{{ lookup('url', 'http://169.254.169.254/openstack/latest/meta_data.json') | from_json }}"
k3s_token: "{{ os_metadata.meta.k3s_token }}"
k3s_server_name: "{{ os_metadata.meta.k3s_server }}"
service_name: "{{ 'k3s-agent' if k3s_server_name is defined else 'k3s' }}"
tasks:
- name: Ensure password directory exists
ansible.builtin.file:
path: "/etc/rancher/node"
state: directory

- name: Set agent node password as token # uses token to keep password consistent between reimages
ansible.builtin.copy:
dest: /etc/rancher/node/password
content: "{{ k3s_token }}"

- name: Add the token for joining the cluster to the environment
no_log: true # avoid logging the server token
ansible.builtin.lineinfile:
path: "/etc/systemd/system/{{ service_name }}.service.env"
line: "K3S_TOKEN={{ k3s_token }}"

- name: Add server url to agents
ansible.builtin.lineinfile:
path: "/etc/systemd/system/{{ service_name }}.service.env"
line: "K3S_URL=https://{{ k3s_server_name }}:6443"
when: k3s_server_name is defined

- name: Start k3s service
ansible.builtin.systemd:
name: "{{ service_name }}"
daemon_reload: true
state: started
enabled: true
78 changes: 78 additions & 0 deletions ansible/roles/k3s/tasks/install.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
---

- name: Check for existing k3s installation
stat:
path: /var/lib/rancher/k3s
register: stat_result

- name: Perform air-gapped installation of k3s
# Using air-gapped install so containers are pre-installed to avoid rate-limiting from registries on cluster startup
when: not stat_result.stat.exists
block:

- name: Download k3s binary
ansible.builtin.get_url:
url: "https://github.com/k3s-io/k3s/releases/download/{{ k3s_version | urlencode }}/k3s"
dest: /usr/bin/k3s
owner: root
group: root
mode: "0755"

- name: Install k3s SELinux policy package
yum:
name: "https://github.com/k3s-io/k3s-selinux/releases/download/{{ k3s_selinux_release }}/k3s-selinux-{{ k3s_selinux_rpm_version }}.el{{ ansible_distribution_major_version }}.noarch.rpm"
disable_gpg_check: true

- name: Create image directory
ansible.builtin.file:
path: "/var/lib/rancher/k3s/agent/images"
state: directory

- name: Install k3s' internal images
ansible.builtin.get_url:
url: "https://github.com/k3s-io/k3s/releases/download/{{ k3s_version | urlencode }}/k3s-airgap-images-amd64.tar.zst"
dest: /var/lib/rancher/k3s/agent/images/k3s-airgap-images-amd64.tar.zst

- name: Download k3s install script
ansible.builtin.get_url:
url: https://get.k3s.io/
timeout: 120
dest: /usr/bin/k3s-install.sh
owner: root
group: root
mode: "0755"

- name: Install k3s
ansible.builtin.shell:
cmd: /usr/bin/k3s-install.sh
environment:
INSTALL_K3S_VERSION: "{{ k3s_version }}"
INSTALL_K3S_EXEC: "{{ item }}"
INSTALL_K3S_SKIP_START: "true"
INSTALL_K3S_SKIP_ENABLE: "true"
INSTALL_K3S_BIN_DIR: "/usr/bin"
INSTALL_K3S_SKIP_DOWNLOAD: "true"
changed_when: true
loop:
- server --disable=traefik
- agent

- name: Install helm
unarchive:
src: "https://get.helm.sh/helm-{{ k3s_helm_version }}-linux-amd64.tar.gz"
dest: /usr/bin
extra_opts: "--strip-components=1"
owner: root
group: root
mode: 0755
remote_src: true

- name: Add k3s kubeconfig as environment variable
ansible.builtin.lineinfile:
path: /etc/environment
line: "KUBECONFIG=/etc/rancher/k3s/k3s.yaml"

- name: Install ansible-init playbook for k3s agent or server activation
copy:
src: start_k3s.yml
dest: /etc/ansible-init/playbooks/0-start-k3s.yml
44 changes: 44 additions & 0 deletions ansible/roles/k9s/tasks/main.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
---

- name: Check if k9s is installed
ansible.builtin.stat:
path: "/usr/bin/k9s"
register: result

- name: Install k9s and clean up temporary files
block:
- name: Create install directory
ansible.builtin.file:
path: /tmp/k9s
state: directory
owner: root
group: root
mode: "744"
when: not result.stat.exists

- name: Download k9s
ansible.builtin.get_url:
url: https://github.com/derailed/k9s/releases/download/v0.32.5/k9s_Linux_amd64.tar.gz
dest: /tmp/k9s/k9s_Linux_amd64.tar.gz
owner: root
group: root
mode: "744"

- name: Unpack k9s binary
ansible.builtin.unarchive:
src: /tmp/k9s/k9s_Linux_amd64.tar.gz
dest: /tmp/k9s
remote_src: yes

- name: Add k9s to root path
ansible.builtin.copy:
src: /tmp/k9s/k9s
dest: /usr/bin/k9s
mode: u+rwx
remote_src: yes

- name: Cleanup k9s install directory
ansible.builtin.file:
path: /tmp/k9s
state: absent
when: not result.stat.exists
1 change: 1 addition & 0 deletions ansible/roles/passwords/defaults/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ slurm_appliance_secrets:
vault_openhpc_mungekey: "{{ secrets_openhpc_mungekey | default(vault_openhpc_mungekey | default(secrets_openhpc_mungekey_default)) }}"
vault_freeipa_ds_password: "{{ vault_freeipa_ds_password | default(lookup('password', '/dev/null')) }}"
vault_freeipa_admin_password: "{{ vault_freeipa_admin_password | default(lookup('password', '/dev/null')) }}"
vault_k3s_token: "{{ vault_k3s_token | default(lookup('ansible.builtin.password', '/dev/null', length=64)) }}"

secrets_openhpc_mungekey_default:
content: "{{ lookup('pipe', 'dd if=/dev/urandom bs=1 count=1024 2>/dev/null | base64') }}"
Expand Down
20 changes: 10 additions & 10 deletions ansible/roles/passwords/tasks/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,14 +7,14 @@
delegate_to: localhost
run_once: true

# - name: Ensure munge key directory exists
# file:
# state: directory
# recurse: true
# path: "{{ openhpc_passwords_mungekey_output_path | dirname }}"
- name: Get templated passwords from target environment
# inventory group/host vars created in a play cannot be accessed in the same play, even after meta: refresh_inventory
ansible.builtin.include_vars:
file: "{{ openhpc_passwords_output_path }}"

# - name: Create a munge key
# copy:
# content: "{{ lookup('password', '/dev/null chars=ascii_letters,digits,hexdigits,punctuation') }}"
# dest: "{{ openhpc_passwords_mungekey_output_path }}"
# force: false
- name: Template k3s token to terraform
template:
src: k3s-token.auto.tfvars.json.j2
dest: "{{ lookup('env', 'APPLIANCES_ENVIRONMENT_ROOT') }}/terraform/k3s-token.auto.tfvars.json"
delegate_to: localhost
run_once: true
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{
"k3s_token": "{{ vault_k3s_token }}"
}
8 changes: 8 additions & 0 deletions docs/k3s.README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Overview
A K3s cluster is deployed with the Slurm cluster. Both an agent and server instance of K3s is installed during image build and the correct service (determined by OpenStack metadata) will be
enabled during boot. Nodes with the `k3s_server` metadata field defined will be configured as K3s agents (this field gives them the address of the server). The Slurm control node is currently configured as a server while all other nodes are configured as agents. Using multiple K3s servers isn't supported. Currently only the root user on the control node has
access to the Kubernetes API. The `k3s` role installs Helm for package management. K9s is also installed in the image and can be used by the root user.

# Idempotency
K3s is intended to only be installed during image build as it is configured by the appliance on first boot with `azimuth_cloud.image_utils.linux_ansible_init`. Therefore, the `k3s` role isn't
idempotent and changes to variables will not be reflected in the image when running `site.yml`.
8 changes: 8 additions & 0 deletions environments/.caas/hooks/pre.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,13 @@
---

# Generate k3s token
- name: Generate k3s token
# NB: Although this generates a new token on each run, the actual token set in metadata is retrieved from a set-once tofu resource, hence only the first value ever generated is relevant.
hosts: openstack
tasks:
- ansible.builtin.set_fact:
k3s_token: "{{ lookup('ansible.builtin.password', '/dev/null', length=64) }}"

# Provision the infrastructure using Terraform
- name: Provision infrastructure
hosts: openstack
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"cluster_image": {
"RL8": "openhpc-RL8-241115-1209-097cdae1",
"RL9": "openhpc-RL9-241115-1209-097cdae1"
"RL8": "openhpc-RL8-241118-0918-4538c6df",
"RL9": "openhpc-RL9-241118-0918-4538c6df"
}
}
5 changes: 5 additions & 0 deletions environments/.stackhpc/terraform/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,10 @@ variable "volume_backed_instances" {
default = false
}

variable "k3s_token" {
type = string
}

data "openstack_images_image_v2" "cluster" {
name = var.cluster_image[var.os_version]
most_recent = true
Expand All @@ -69,6 +73,7 @@ module "cluster" {
key_pair = "slurm-app-ci"
cluster_image_id = data.openstack_images_image_v2.cluster.id
control_node_flavor = var.control_node_flavor
k3s_token = var.k3s_token

login_nodes = {
login-0: var.other_node_flavor
Expand Down
6 changes: 6 additions & 0 deletions environments/common/inventory/groups
Original file line number Diff line number Diff line change
Expand Up @@ -136,5 +136,11 @@ freeipa_client
[ansible_init]
# Hosts to run linux-anisble-init

[k3s]
# Hosts to run k3s server/agent

[k9s]
# Hosts to install k9s on

[lustre]
# Hosts to run lustre client
8 changes: 8 additions & 0 deletions environments/common/layouts/everything
Original file line number Diff line number Diff line change
Expand Up @@ -82,5 +82,13 @@ openhpc
# Hosts to run ansible-init
cluster

[k3s:children]
# Hosts to run k3s server/agent
openhpc

[k9s:children]
# Hosts to install k9s on
control

[lustre]
# Hosts to run lustre client
Loading
Loading