Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
226 changes: 226 additions & 0 deletions docs/add-disks-to-vms.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,226 @@
Maintenance objective:
Add a 100GB disk to each of the worker nodes.

Departments involved: RPC support, RPC Manage Kubernetes SME's
Owning department: RPC support, RPC Managed Kubernetes SME's
Amount of time estimated for maintenance: 4 hours

--------------------------------------------------------------------------------
Maintenance Steps:
--------------------------------------------------------------------------------

## Get things ready

1 - Setup the environment to work with the cluster

* Credentials are stored in passwordsafe

```
cd /etc/openCenter/infrastructure/clusters/prosys_test
source venv/bin/activate
export BIN=${PWD}/.bin
export PATH=${BIN}:${PATH}
export ANSIBLE_INVENTORY=${PWD}/inventory/inventory.yaml
export KUBECONFIG=${PWD}/kubeconfig.yaml
export AWS_ACCESS_KEY_ID=<REPLACE ME>
export AWS_SECRET_ACCESS_KEY=<REPLACE ME>
export TF_VAR_os_application_credential_id='<REAPLCE ME>'
export TF_VAR_os_application_credential_secret="<REPLACE ME>"

```

2 - Verify the cluster state.

* If there are changes in the plan review them first and work through any potential blockers for the maintenance.

```bash
# terraform plan
...
module.kubespray-cluster.null_resource.run_kubespray[0]: Refreshing state... [id=2694448614732380735]
module.kubespray-cluster.null_resource.copy_and_update_kubeconfig: Refreshing state... [id=812292398106547937]

No changes. Your infrastructure matches the configuration.

# kubectl get nodes
NAME STATUS ROLES AGE VERSION
prosys-prod-cp0 Ready control-plane 31d v1.32.8
prosys-prod-cp1 Ready control-plane 31d v1.32.8
prosys-prod-cp2 Ready control-plane 31d v1.32.8
prosys-prod-wn0 Ready <none> 31d v1.32.8
prosys-prod-wn1 Ready <none> 31d v1.32.8
prosys-prod-wn2 Ready <none> 31d v1.32.8
prosys-prod-wn3 Ready <none> 31d v1.32.8
prosys-prod-wn4 Ready <none> 30d v1.32.8


```

3 - Update the main.tf with the desired additional block devices

* There is a section to be added to the locals variables and another to the openstack-nova module.

**locals section**
```h
additional_block_devices_worker = [
{
source_type = "blank"
volume_size = 20
volume_type = "Performance"
boot_index = -1
destination_type = "volume"
delete_on_termination = true
mountpoint = "/var/lib/longhorn"
filesystem = "ext4"
label = "longhorn-vol"
},
]
```

**module "openstack-nova"**

```
module "openstack-nova" {
source = "github.com/rackerlabs/openCenter-gitops-base.git//iac/cloud/openstack/openstack-nova?ref=multi-disk1"
availability_zone = local.availability_zone
additional_block_devices_worker = local.additional_block_devices_worker <---- Add this line


```


4 - Verify the plan

* If we run `terraform plan` we can review the configuration updates to be made. If we were to apply the changes it would replace all of the worker nodes at the same time which would break the cluster. We will need to use targeted apply to replace one node at a time and add it back to the cluster.


-----------------------

## Apply the changes

```bash
# kubectl get nodes
NAME STATUS ROLES AGE VERSION
prosys-prod-cp0 Ready control-plane 31d v1.32.8
prosys-prod-cp1 Ready control-plane 31d v1.32.8
prosys-prod-cp2 Ready control-plane 31d v1.32.8
prosys-prod-wn0 Ready <none> 31d v1.32.8
prosys-prod-wn1 Ready <none> 31d v1.32.8
prosys-prod-wn2 Ready <none> 31d v1.32.8
prosys-prod-wn3 Ready <none> 31d v1.32.8
prosys-prod-wn4 Ready <none> 30d v1.32.8
```

We will target one worker node at a time starting with the last one.

1 - Remove the node from the cluster

```
cd kubespray
ansible-playbook playbooks/facts.yml -b
ansible-playbook -b playbooks/remove_node.yml -e node=prosys-prod-wn4
```

2 - Apply the change only targeting the single node

```
terraform apply -target 'module.openstack-nova.module.node_worker.openstack_compute_instance_v2.node[4]'

Apply complete! Resources: 3 added, 0 changed, 1 destroyed.

```

3 - Get the node ready

Apply hardening
```
# ansible-playbook ./inventory/os_hardening_playbook.yml -b --become-user=root --limit=prosys-prod-wn4

```

4 - Find the resulting disk configuration and set it in the group vars

```
# ansible prosys-prod-wn4 -m shell -a 'fdisk -l' -b

prosys-prod-wn4 | CHANGED | rc=0 >>
Disk /dev/vda: 100 GiB, 107372085248 bytes, 209711104 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: A49C7288-9CD0-4F90-8F06-1C713F260C8C

Device Start End Sectors Size Type
/dev/vda1 2099200 209711070 207611871 99G Linux filesystem
/dev/vda14 2048 10239 8192 4M BIOS boot
/dev/vda15 10240 227327 217088 106M EFI System
/dev/vda16 227328 2097152 1869825 913M Linux extended boot

Partition table entries are not in disk order.


Disk /dev/vdb: 64 GiB, 68719476736 bytes, 134217728 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/vdc: 4 GiB, 4294967296 bytes, 8388608 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/vdd: 20 GiB, 21472739328 bytes, 41938944 sectors <----- NEW DRIVE
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
```

4 - Update the disk configuration in the group_vars with the same information as above.

```yaml
# cat inventory/group_vars/oc_worker_nodes.yaml
disk_config:
- device: "/dev/vdd" <----- set the device name accordingly
label: "longhorn-vol"
mountpoint: "/var/lib/longhorn"
filesystem: "ext4"
boot_index: 1
volume_size: 20

```

5 - Run the playbook to configure the disks

```bash
ansible-playbook -b configure-disks.yaml --limit=prosys-prod-wn4
```

6 - Add the node to the cluster

```bash
cd kubespray
ansible-playbook playbooks/facts.yml -b
ansible-playbook -b playbooks/scale.yml -e "@../inventory/k8s_hardening.yml" --limit=prosys-prod-wn4

```

## Verify things are working

1 - Verify the node is in ready state

```bash
# kubectl get nodes
NAME STATUS ROLES AGE VERSION
prosys-prod-cp0 Ready control-plane 31d v1.32.8
prosys-prod-cp1 Ready control-plane 31d v1.32.8
prosys-prod-cp2 Ready control-plane 31d v1.32.8
prosys-prod-wn0 Ready <none> 31d v1.32.8
prosys-prod-wn1 Ready <none> 31d v1.32.8
prosys-prod-wn2 Ready <none> 31d v1.32.8
prosys-prod-wn3 Ready <none> 31d v1.32.8
prosys-prod-wn4 Ready <none> 20m v1.32.8 <---- New Node Ready
```


13 changes: 13 additions & 0 deletions iac/cloud/openstack/lib/openstack-compute/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,19 @@ resource "openstack_compute_instance_v2" "node" {
destination_type = var.node_bfv_destination_type
delete_on_termination = var.node_bfv_delete_on_termination
}

dynamic "block_device" {
for_each = var.additional_block_devices
content {
uuid = block_device.value.source_type == "blank" ? "" : null
source_type = block_device.value.source_type
volume_size = block_device.value.volume_size
volume_type = block_device.value.destination_type == "local" ? "" : block_device.value.volume_type
boot_index = block_device.value.boot_index
destination_type = block_device.value.destination_type
delete_on_termination = block_device.value.delete_on_termination
}
}

network {
port = openstack_networking_port_v2.node[count.index].id
Expand Down
16 changes: 16 additions & 0 deletions iac/cloud/openstack/lib/openstack-compute/variables.tf
Original file line number Diff line number Diff line change
@@ -1,3 +1,19 @@
variable "additional_block_devices" {
description = "List of additional block devices to attach to instances"
type = list(object({
source_type = string # "blank", "image", "volume", "snapshot"
volume_size = number
volume_type = optional(string, "")
boot_index = number # Must be > 0 for non-boot devices
destination_type = optional(string, "volume")
delete_on_termination = optional(bool, true)
mountpoint = string
filesystem = optional(string, "ext4")
label = string
}))
default = []
}

variable "allowed_addresses" {
type = list(string)
default = []
Expand Down
2 changes: 2 additions & 0 deletions iac/cloud/openstack/openstack-nova/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ module "node_master" {
source = "../lib/openstack-compute"

depends_on = [module.bastion, module.ssh-keypair, module.secgroup]
additional_block_devices = var.additional_block_devices_master
availability_zone = var.availability_zone
allowed_addresses = [var.vrrp_ip ,var.subnet_nodes, var.subnet_pods, var.subnet_services]
flavor_name = var.size_master.flavor
Expand Down Expand Up @@ -92,6 +93,7 @@ module "node_worker" {
source = "../lib/openstack-compute"

depends_on = [module.bastion, module.ssh-keypair, module.secgroup]
additional_block_devices = var.additional_block_devices_worker
availability_zone = var.availability_zone
allowed_addresses = [var.subnet_nodes, var.subnet_pods, var.subnet_services]
flavor_name = var.size_worker.flavor
Expand Down
32 changes: 32 additions & 0 deletions iac/cloud/openstack/openstack-nova/variables.tf
Original file line number Diff line number Diff line change
@@ -1,3 +1,35 @@
variable "additional_block_devices_worker" {
description = "List of additional block devices to attach to worker instances"
type = list(object({
source_type = string # "blank", "image", "volume", "snapshot"
volume_size = number
volume_type = optional(string, "")
boot_index = number # Must be > 0 for non-boot devices
destination_type = optional(string, "volume")
delete_on_termination = optional(bool, true)
mountpoint = string
filesystem = optional(string, "ext4")
label = string
}))
default = []
}

variable "additional_block_devices_master" {
description = "List of additional block devices to attach to master instances"
type = list(object({
source_type = string # "blank", "image", "volume", "snapshot"
volume_size = number
volume_type = optional(string, "")
boot_index = number # Must be > 0 for non-boot devices
destination_type = optional(string, "volume")
delete_on_termination = optional(bool, true)
mountpoint = string
filesystem = optional(string, "ext4")
label = string
}))
default = []
}

variable "additional_ports_master" {
description = "List of additional ports to create security group rules for custom applications"
type = list(string)
Expand Down
Loading