Fix docs based on review

eliaoggian · eliaoggian · commit 115c9b4872ae · 2025-07-15T17:05:40.000+02:00
diff --git a/docs/kubernetes/clusters.md b/docs/kubernetes/clusters.md
@@ -2,17 +2,13 @@
 
 This document provides an overview of the Kubernetes clusters maintained by CSCS and offers step-by-step instructions for accessing and interacting with them.
 
----
-
 ## Architecture
 
 All Kubernetes clusters at CSCS are:
 
 - Managed using **[Rancher](https://www.rancher.com)**
 - Running **[RKE2 (Rancher Kubernetes Engine 2)](https://github.com/rancher/rke2)**
 
----
-
 ## Cluster Environments
 
 Clusters are grouped into two main environments:
@@ -22,8 +18,6 @@ Clusters are grouped into two main environments:
 
 TDS clusters receive updates first. If no issues arise, the same updates are then applied to PROD clusters.
 
----
-
 ## Kubernetes API Access
 
 You can access the Kubernetes API in two main ways:
@@ -40,8 +34,6 @@ You can access the Kubernetes API in two main ways:
 
 To check which method you are using, examine the `current-context` in your `kubeconfig` file.
 
----
-
 ## Cluster Access
 
 To interact with the cluster, you need the `kubectl` CLI:  
@@ -79,15 +71,11 @@ export KUBECONFIG=/home/user/kubeconfig.yaml
 
 > ⚠️ The kubeconfig file contains credentials. Keep it secure.
 
----
-
 ## Pre-installed Applications
 
 All CSCS-provided clusters include a set of pre-installed tools and components, described below:
 
----
-
-### 📦 `ceph-csi`
+### `ceph-csi`
 
 Provides **dynamic persistent volume provisioning** via the Ceph Container Storage Interface.
 
@@ -98,9 +86,7 @@ Provides **dynamic persistent volume provisioning** via the Ceph Container Stora
 - `rbd-nvme` – RWO, backed by NVMe (high-performance workloads like databases)
 - `*-retain` – Same classes, but retain the volume after PVC deletion
 
----
-
-### 🌐 `external-dns`
+### `external-dns`
 
 Automatically manages DNS entries for:
 
@@ -115,9 +101,7 @@ kubectl annotate service nginx "external-dns.alpha.kubernetes.io/hostname=nginx.
 !!! info "Use a valid name under the configured subdomain"
     [external-dns documentation](https://github.com/kubernetes-sigs/external-dns)
 
----
-
-### 🔐 `cert-manager`
+### `cert-manager`
 
 Handles automatic issuance of TLS certificates from Let's Encrypt.
 
@@ -141,19 +125,15 @@ You can also issue certs automatically via Ingress annotations (see `ingress-ngi
 
 📄 [cert-manager documentation](https://cert-manager.io)
 
----
-
-### 📡 `metallb`
+### `metallb`
 
 Enables `LoadBalancer` service types by assigning public IPs.
 
 > ⚠️ The public IP pool is limited.  
 Prefer using `Ingress` unless you specifically need a `LoadBalancer`.  
 📄 [metallb documentation](https://metallb.universe.tf)
 
----
-
-### 🌍 `ingress-nginx`
+###  `ingress-nginx`
 
 Default Ingress controller with class `nginx`.  
 Supports automatic TLS via cert-manager annotations.
@@ -188,25 +168,19 @@ spec:
 📄 [NGINX Ingress Docs](https://docs.nginx.com/nginx-ingress-controller)  
 📄 [cert-manager Ingress Usage](https://cert-manager.io/docs/usage/ingress/)
 
----
-
-### 🔑 `external-secrets`
+### `external-secrets`
 
 Integrates with secret management tools like **HashiCorp Vault**.
 
 📄 [external-secrets documentation](https://external-secrets.io/)
 
----
-
-### 🔁 `kured`
+### `kured`
 
 Responsible for automatic node reboots (e.g., after kernel updates).
 
 📄 [kured documentation](https://kured.dev/)
 
----
-
-### 📊 Observability
+### Observability
 
 Includes:
 
diff --git a/docs/kubernetes/kubernetes-upgrades.md b/docs/kubernetes/kubernetes-upgrades.md
@@ -2,8 +2,6 @@
 
 To maintain a secure, stable, and supported platform, we regularly upgrade our Kubernetes clusters. We use **[RKE2](https://docs.rke2.io/)** as our Kubernetes distribution.
 
----
-
 ## 🔄 Upgrade Flow
 
 - **Phased Rollout**:
@@ -15,8 +13,6 @@ To maintain a secure, stable, and supported platform, we regularly upgrade our K
   - Timing may depend on compatibility with **other infrastructure components** (e.g., storage, CNI plugins, monitoring tools).
   - However, all clusters will be upgraded **before the current Kubernetes version reaches End of Life (EOL)**.
 
----
-
 ## ⚠️ Upgrade Impact
 
 The **impact of a Kubernetes upgrade can vary**, depending on the nature of the changes involved:
@@ -31,18 +27,8 @@ The **impact of a Kubernetes upgrade can vary**, depending on the nature of the
 
 > 💡 Applications that follow cloud-native best practices (e.g., readiness probes, multiple replicas, graceful shutdown handling) are **less likely to be impacted** by upgrades.
 
----
-
 ## ✅ What You Can Expect
 
 - Upgrades are performed using safe, tested procedures with minimal risk to production workloads.
 - TDS clusters serve as a **canary environment**, allowing us to identify issues early.
 - All clusters are kept **aligned with supported Kubernetes versions**.
-
----
-
-## 💬 Questions?
-
-If you have any questions about upcoming Kubernetes upgrades or want help verifying your application’s readiness, please contact the Network and Cloud team via Service Desk ticket.
-
-Thank you for your support and collaboration in keeping our platform secure and reliable.
diff --git a/docs/kubernetes/node-upgrades.md b/docs/kubernetes/node-upgrades.md
@@ -2,8 +2,6 @@
 
 To ensure the **security** and **stability** of our infrastructure, CSCS will perform **monthly OS updates** on all nodes of our Kubernetes clusters.
 
----
-
 ## 🔄 Maintenance Schedule
 
 - **Frequency**: Every **first week of the month**  
@@ -14,8 +12,6 @@ These updates include important security patches and system updates for the oper
 
 > ⚠️ **Note:** Nodes will be **rebooted only if required** by the updates. If no reboot is necessary, nodes will remain in service without disruption.
 
----
-
 ## 🚨 Urgent Security Patches
 
 In the event of a **critical zero-day vulnerability**, we will apply patches and perform reboots (if required) **as soon as possible**, outside of the regular update schedule if needed.  
@@ -24,8 +20,6 @@ In the event of a **critical zero-day vulnerability**, we will apply patches and
 - Users will be notified ahead of time **when possible**.
 - Standard safety and rolling reboot practices will still be followed.
 
----
-
 ## 🛠️ Reboot Management with Kured
 
 We use [**Kured** (KUbernetes REboot Daemon)](https://github.com/kubereboot/kured) to safely automate the reboot process. Kured ensures that:
@@ -35,8 +29,6 @@ We use [**Kured** (KUbernetes REboot Daemon)](https://github.com/kubereboot/kure
 - Reboots occur **only during the defined window** 
 - Nodes are **cordoned**, **drained**, and **gracefully reintegrated** after reboot.
 
----
-
 ## ✅ Application Requirements
 
 To avoid service disruption during node maintenance, applications **must be designed for high availability**. Specifically:
@@ -50,10 +42,7 @@ To avoid service disruption during node maintenance, applications **must be desi
 
 > ❗ Applications that do not meet these requirements **may experience temporary disruption** during node reboots.
 
----
-
 ## 👩‍💻 Need Help?
 
 If you have questions or need help preparing your applications for rolling node maintenance, please contact the Network and Cloud team via Service Desk ticket.
 
-Thank you for your cooperation and commitment to building robust, cloud-native services.