You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/kubernetes/clusters.md
+8-34Lines changed: 8 additions & 34 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,17 +2,13 @@
2
2
3
3
This document provides an overview of the Kubernetes clusters maintained by CSCS and offers step-by-step instructions for accessing and interacting with them.
4
4
5
-
---
6
-
7
5
## Architecture
8
6
9
7
All Kubernetes clusters at CSCS are:
10
8
11
9
- Managed using **[Rancher](https://www.rancher.com)**
Copy file name to clipboardExpand all lines: docs/kubernetes/kubernetes-upgrades.md
-14Lines changed: 0 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,8 +2,6 @@
2
2
3
3
To maintain a secure, stable, and supported platform, we regularly upgrade our Kubernetes clusters. We use **[RKE2](https://docs.rke2.io/)** as our Kubernetes distribution.
4
4
5
-
---
6
-
7
5
## 🔄 Upgrade Flow
8
6
9
7
-**Phased Rollout**:
@@ -15,8 +13,6 @@ To maintain a secure, stable, and supported platform, we regularly upgrade our K
15
13
- Timing may depend on compatibility with **other infrastructure components** (e.g., storage, CNI plugins, monitoring tools).
16
14
- However, all clusters will be upgraded **before the current Kubernetes version reaches End of Life (EOL)**.
17
15
18
-
---
19
-
20
16
## ⚠️ Upgrade Impact
21
17
22
18
The **impact of a Kubernetes upgrade can vary**, depending on the nature of the changes involved:
@@ -31,18 +27,8 @@ The **impact of a Kubernetes upgrade can vary**, depending on the nature of the
31
27
32
28
> 💡 Applications that follow cloud-native best practices (e.g., readiness probes, multiple replicas, graceful shutdown handling) are **less likely to be impacted** by upgrades.
33
29
34
-
---
35
-
36
30
## ✅ What You Can Expect
37
31
38
32
- Upgrades are performed using safe, tested procedures with minimal risk to production workloads.
39
33
- TDS clusters serve as a **canary environment**, allowing us to identify issues early.
40
34
- All clusters are kept **aligned with supported Kubernetes versions**.
41
-
42
-
---
43
-
44
-
## 💬 Questions?
45
-
46
-
If you have any questions about upcoming Kubernetes upgrades or want help verifying your application’s readiness, please contact the Network and Cloud team via Service Desk ticket.
47
-
48
-
Thank you for your support and collaboration in keeping our platform secure and reliable.
Copy file name to clipboardExpand all lines: docs/kubernetes/node-upgrades.md
-11Lines changed: 0 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,8 +2,6 @@
2
2
3
3
To ensure the **security** and **stability** of our infrastructure, CSCS will perform **monthly OS updates** on all nodes of our Kubernetes clusters.
4
4
5
-
---
6
-
7
5
## 🔄 Maintenance Schedule
8
6
9
7
-**Frequency**: Every **first week of the month**
@@ -14,8 +12,6 @@ These updates include important security patches and system updates for the oper
14
12
15
13
> ⚠️ **Note:** Nodes will be **rebooted only if required** by the updates. If no reboot is necessary, nodes will remain in service without disruption.
16
14
17
-
---
18
-
19
15
## 🚨 Urgent Security Patches
20
16
21
17
In the event of a **critical zero-day vulnerability**, we will apply patches and perform reboots (if required) **as soon as possible**, outside of the regular update schedule if needed.
@@ -24,8 +20,6 @@ In the event of a **critical zero-day vulnerability**, we will apply patches and
24
20
- Users will be notified ahead of time **when possible**.
25
21
- Standard safety and rolling reboot practices will still be followed.
26
22
27
-
---
28
-
29
23
## 🛠️ Reboot Management with Kured
30
24
31
25
We use [**Kured** (KUbernetes REboot Daemon)](https://github.com/kubereboot/kured) to safely automate the reboot process. Kured ensures that:
@@ -35,8 +29,6 @@ We use [**Kured** (KUbernetes REboot Daemon)](https://github.com/kubereboot/kure
35
29
- Reboots occur **only during the defined window**
36
30
- Nodes are **cordoned**, **drained**, and **gracefully reintegrated** after reboot.
37
31
38
-
---
39
-
40
32
## ✅ Application Requirements
41
33
42
34
To avoid service disruption during node maintenance, applications **must be designed for high availability**. Specifically:
@@ -50,10 +42,7 @@ To avoid service disruption during node maintenance, applications **must be desi
50
42
51
43
> ❗ Applications that do not meet these requirements **may experience temporary disruption** during node reboots.
52
44
53
-
---
54
-
55
45
## 👩💻 Need Help?
56
46
57
47
If you have questions or need help preparing your applications for rolling node maintenance, please contact the Network and Cloud team via Service Desk ticket.
58
48
59
-
Thank you for your cooperation and commitment to building robust, cloud-native services.
0 commit comments