Name	Name	Last commit message	Last commit date
parent directory ..
bin	bin
configs	configs
modules	modules
test	test
.gitignore	.gitignore
.terraform.lock.hcl	.terraform.lock.hcl
README.md	README.md
create_talos_node_configs	create_talos_node_configs
get_local_configs	get_local_configs
main.tf	main.tf
providers.tf	providers.tf
secrets.providers.sops.yaml	secrets.providers.sops.yaml
secrets.roon.sops.yaml	secrets.roon.sops.yaml
secrets.talos.archive.sops.yaml	secrets.talos.archive.sops.yaml
secrets.talos.sops.yaml	secrets.talos.sops.yaml
ubuntu-roon.tf	ubuntu-roon.tf

Infrastructure

This directory houses the code that transforms raw bare-metal machines into functional Kubernetes clusters. The code in this directory depends on a LAN-accessible Proxmox VE installation (or multiple!) to create a bare-bones Kubernetes cluster using Talos.

Prerequisites

Terraform installed (check the providers file for the specific version requirements)
Install Proxmox VE v8.0+ on a bare-metal machine (or more than one).
Ensure you have a file named aws-credentials in the talos directory in the format:
```
[default]
aws_access_key_id = <redacted>
aws_secret_access_key = <redacted>
```
This provides authentication to the AWS S3 bucket backend that stores the TF state AND the KMS secret to decrypt the sops file.

How To Use

Populate main.tf as desired.
Run the desired Terraform commands (i.e. terraform plan, terraform apply).
Run this command to create the per-node Talos configs:
```
./create_talos_node_configs
```

Apply (or dry-run apply) to the desired Talos cluster nodes:

# Dry run
talosctl apply-config -n skrillex --file ./nodes/skrillex.yaml --dry-run
# Apply
talosctl apply-config -n skrillex --file ./nodes/skrillex.yaml

Follow the Kubernetes bootstrapping steps defined here

Operations

Adding New Disks To Proxmox

If the disk is already formatted, you'll need to 'zap' it to remove formatting to be able to add it to an LVM-Thin Pool. To do so: sgdisk --zap-all /dev/<disk>. To find the disk path, use lsblk.
1. The disk will be formatted if, for example, if was previously used in a storage cluster.
Navigate to relevant node in Proxmox GUI -> Disks -> LVM-Thin -> "Create: Thinpool"
Select new disk by block device name (lsblk might help show available nodes on the node). Example names: /dev/sda, /dev/sdb, /dev/sdc.
Give the disk a name. I've chosen to increment the disks by the bay #. Example: Bay #3 -> disk3
Hit "Create".

Adding New OSD To Rook/Ceph

EZPZ. Once the disk is available on the node:

kubectl -n storage rollout restart deploy/rook-ceph-operator

Removing OSD From Rook/Ceph

Identify the OSD(s) to remove:

kubectl -n storage exec -it deploy/rook-ceph-tools -- ceph osd tree

Mark the OSD out (starts data rebalancing away from it):

kubectl -n storage exec -it deploy/rook-ceph-tools -- ceph osd out osd.<ID>

Wait for rebalancing to complete:

kubectl -n storage exec -it deploy/rook-ceph-tools -- ceph -w
# Or check status periodically
# Wait until HEALTH_OK or at least no recovery/rebalancing in progress.
kubectl -n storage exec -it deploy/rook-ceph-tools -- ceph status

Purge the OSD (removes it from the cluster):

kubectl -n storage exec -it deploy/rook-ceph-tools -- ceph osd purge osd.<ID> --yes-i-really-mean-it

Delete the OSD deployment:

kubectl -n storage delete deploy rook-ceph-osd-<ID>

If the disk was explicitly listed in CephCluster CR, update the spec to remove it, otherwise Rook may try to recreate the OSD.

Clean the disk (if reusing or decommissioning):

# From rook-tools or the node itself
kubectl -n storage exec -it deploy/rook-ceph-tools -- ceph-volume lvm zap /dev/sdX --destroy

Tip: If removing multiple OSDs, do them one at a time and wait for full rebalancing between each to minimize risk and cluster load.

Upgrading Kubernetes

Can perform a rolling upgrade with the TF provider and a standard TF workflow (or talosctl also works per the docs).

Upgrading Talos

The Talos TF provider is relatively under-featured for upgrades (Example of lack of features here). So it's best to use talosctl and follow the more production-ready upgrade path here.

Upgrading Talos-Controlled, Kubernetes-State-Managed Resources

There are some Kubernetes configurations, such as the kube-proxy configuration, which talosctl manages but only touches during Kubernetes bootstraps/upgrades. Here is a good example. To update a resource whose state is entirely within Kubernetes, but the config is managed via Talos, refer to the upgrading Kubernetes section above

Upgrading Multiple Version Of Talos of Kubernetes At Once

Talos & Kubernetes versions are linked quite closely. If you're multiple versions behind on each:

Upgrade them in lock step (i.e. upgrade Talos one minor version, then Kubernetes one minor version). If this isn't done, weird stuff can start to happen. (Trust me)
Upgrade Talos's minor version first, then Kubernetes's minor version.

TODOs

Enable Talos logs to be sent to a logging endpoint, similar to this example.

Notes

IP Declaration

I attempted for quite a while to avoid tedious manual declarations of IP addresses for each Kubernetes node. I found some level of success assigning MAC addresses to the VMs, reading the DHCP-assigned IPv4 addresses from the Unifi Router, then using that in the rest of the process. But ultimately, it was unsuccessful. The Unifi Router would begin to get confused with the introduction of virtual IPs, such as the Talos Virtual IP, and begin to return the Virtual IP when I needed the direct node IP. I had to scrap this idea unfortunately. Instead, we assign each node an IP address and MAC address in main.tf manually.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Infrastructure

Prerequisites

How To Use

Operations

Adding New Disks To Proxmox

Adding New OSD To Rook/Ceph

Removing OSD From Rook/Ceph

Upgrading Kubernetes

Upgrading Talos

Upgrading Talos-Controlled, Kubernetes-State-Managed Resources

Upgrading Multiple Version Of Talos of Kubernetes At Once

TODOs

Notes

IP Declaration

FilesExpand file tree

infrastructure

Directory actions

More options

Directory actions

More options

Latest commit

History

infrastructure

Folders and files

parent directory

README.md

Infrastructure

Prerequisites

How To Use

Operations

Adding New Disks To Proxmox

Adding New OSD To Rook/Ceph

Removing OSD From Rook/Ceph

Upgrading Kubernetes

Upgrading Talos

Upgrading Talos-Controlled, Kubernetes-State-Managed Resources

Upgrading Multiple Version Of Talos of Kubernetes At Once

TODOs

Notes

IP Declaration