Skip to content

Commit fb3e34b

Browse files
lucianvladHaoruiPeng
authored andcommitted
release: Add falco overrides for the falco GPU alerts
1 parent 4d4b00a commit fb3e34b

File tree

8 files changed

+128
-86
lines changed

8 files changed

+128
-86
lines changed

changelog/0.45.md

Lines changed: 48 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,55 @@
11
# v0.45.0
22

3-
Released 2025-03-12
3+
Released 2025-03-19
4+
5+
> [!WARNING]
6+
> **Security Notice(s)**
7+
>
8+
> - Upgraded cert-manager to v1.17.1 which addresses critical [CVE-2024-45337](https://github.com/advisories/GHSA-v778-237x-gjrc)
9+
<!-- -->
10+
> [!IMPORTANT]
11+
> **Platform Administrator Notice(s)**
12+
>
13+
> - Running `ck8s update-ips` on existing OpenStack CAPI clusters will now allow the entire subnet instead of individual node IPs.
14+
<!-- -->
15+
> [!NOTE]
16+
> **Application Developer Notice(s)**
17+
>
18+
> - A new guardrail is added that is enabled by default on ClusterAPI environments. It will by default warn but not deny usage of emptyDir storage, since this can stop cluster autoscaler from scaling down nodes. Read more on [this page](https://elastisys.io/welkin/user-guide/safeguards/enforce-no-local-storage-emptydir/).
19+
> - A new guardrail is added that is enabled by default on ClusterAPI environments. It will by default warn but not deny usage of Pods without backing controllers, since this can stop cluster autoscaler from scaling down nodes. Read more on [this page](https://elastisys.io/welkin/user-guide/safeguards/enforce-no-pod-without-controller).
20+
> - Cert-manager was upgraded to v1.17.1. This comes with some potentially [breaking changes](https://github.com/cert-manager/cert-manager/releases/tag/v1.16.0) for [Venafi Issuer](https://cert-manager.io/docs/configuration/venafi/)
21+
> - RBAC to modify the configmaps `fluentd-extra-config` and `fluentd-extra-plugins`, and to delete any fluentd pod in the `fluentd` namespace has been removed.<br>Reach out to a platform administrator if any additional config or plugins are needed!
422
523
## Changes by kind
624

25+
### Feature(s)
26+
27+
- [#2414](https://github.com/elastisys/compliantkubernetes-apps/pull/2414) - apps-sc: add policy to reject local storage emptydir [@viktor-f](https://github.com/viktor-f)
28+
- [#2429](https://github.com/elastisys/compliantkubernetes-apps/pull/2429) - apps: gatekeeper policy to reject pods without controller [@viktor-f](https://github.com/viktor-f)
29+
- [#2451](https://github.com/elastisys/compliantkubernetes-apps/pull/2451) - Add cert-manager mixin dashboard [@anders-elastisys](https://github.com/anders-elastisys)
30+
- [ac05981e](https://github.com/elastisys/compliantkubernetes-apps/pull/2465/commits/ac05981ed5305a12a2f54fd5594d44c0727c5287) - gpu-operator: Reconfigre gpu driver version for surpporting ubuntu 24.04 [@HaoruiPeng](https://github.com/HaoruiPeng)
31+
### Improvement(s)
32+
33+
- [#2418](https://github.com/elastisys/compliantkubernetes-apps/pull/2418) - Update helm/trivy-operator to 0.26.0 and trivy-operator to 0.24.0 [@OlleLarsson](https://github.com/OlleLarsson)
34+
- [#2435](https://github.com/elastisys/compliantkubernetes-apps/pull/2435) - Upgrade cert-manager helm chart to v1.17.1 [@anders-elastisys](https://github.com/anders-elastisys)
35+
- [#2438](https://github.com/elastisys/compliantkubernetes-apps/pull/2438) - apps sc: made alert runbooks configurable [@davidumea](https://github.com/davidumea)
36+
- [#2440](https://github.com/elastisys/compliantkubernetes-apps/pull/2440) - Allow subnet in update-ips [@simonklb](https://github.com/simonklb)
37+
- [#2448](https://github.com/elastisys/compliantkubernetes-apps/pull/2448) - Make update-ips apply exit 0 even if it has applied diffs [@simonklb](https://github.com/simonklb)
38+
- [#2450](https://github.com/elastisys/compliantkubernetes-apps/pull/2450) - add ipv6 support to update-ips [@vomba](https://github.com/vomba)
39+
- [#2454](https://github.com/elastisys/compliantkubernetes-apps/pull/2454) - Upgrade Thanos chart to v15.13.1 [@anders-elastisys](https://github.com/anders-elastisys)
40+
- [c71b5612](https://github.com/elastisys/compliantkubernetes-apps/pull/2465/commits/c71b5612d6750f2c9d039cb11d9f839ebddb243f) - release: Add falco overrides for the falco GPU alerts [@lucianvlad](https://github.com/lucianvlad)
41+
### Deprecation(s)
42+
43+
- [#2457](https://github.com/elastisys/compliantkubernetes-apps/pull/2457) - Remove rbac for additional fluentd config and plugins configmaps [@OlleLarsson](https://github.com/OlleLarsson)
44+
745
### Other(s)
846

9-
- [#2464](https://github.com/elastisys/compliantkubernetes-apps/pull/2464) - other: Port patch changelogs v0.42.2, v0.43.1, v0.44.1 [@lunkan93](https://github.com/lunkan93)
47+
- [#2411](https://github.com/elastisys/compliantkubernetes-apps/pull/2411) - bug: apps: change query expression for LessKubeletsThanNodes alerts [@HaoruiPeng](https://github.com/HaoruiPeng)
48+
- [#2442](https://github.com/elastisys/compliantkubernetes-apps/pull/2442) - bug: apps sc: Fixed s3 size alert to not double count postgres buckets [@Xartos](https://github.com/Xartos)
49+
- [#2443](https://github.com/elastisys/compliantkubernetes-apps/pull/2443) - clean-up: Only uninstall with local environment if local cluster exist [@simonklb](https://github.com/simonklb)
50+
- [#2447](https://github.com/elastisys/compliantkubernetes-apps/pull/2447) - other: Port 0.44.0 [@lunkan93](https://github.com/lunkan93)
51+
- [#2452](https://github.com/elastisys/compliantkubernetes-apps/pull/2452) - clean-up: Remove (un)encrypted kubeconfig log message [@simonklb](https://github.com/simonklb)
52+
- [#2456](https://github.com/elastisys/compliantkubernetes-apps/pull/2456) - documentation: Fix LICENSE [@cristiklein](https://github.com/cristiklein)
53+
- [8618c11a](https://github.com/elastisys/compliantkubernetes-apps/pull/2465/commits/8618c11a1c5172f8ac7738f4c604e156225915c2) - release: Update grafana and opensearch dashboards [@HaoruiPeng](https://github.com/HaoruiPeng)
54+
- [70645a16](70645a16fd5a3b17bb4d1e3be825faa19720d109) - bug: Fix templating issue for external-dns affinity and resources [@HaoruiPeng](https://github.com/HaoruiPeng)
55+
- [3f83e724](https://github.com/elastisys/compliantkubernetes-apps/pull/2465/commits/3f83e724a343548c0924a7c94cb396523027cd33) - Add changelog for release v0.45.0 [@HaoruiPeng](https://github.com/HaoruiPeng)

config/common-config.yaml

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1233,3 +1233,38 @@ externalDns:
12331233
# recordType: A
12341234
# targets:
12351235
# - sc control-plane nodes
1236+
gpu:
1237+
enabled: false
1238+
operator:
1239+
resources: {}
1240+
tolerations: []
1241+
affinity: {}
1242+
driver:
1243+
# use driver version 570.124.06 for supporting cluster running on Ubuntu 24.04.
1244+
version: "570.124.06"
1245+
env:
1246+
- name: NVIDIA_VISIBLE_DEVICES
1247+
value: all
1248+
nodeFeatureDiscovery:
1249+
worker:
1250+
resources: {}
1251+
tolerations:
1252+
- key: elastisys.io/node-type
1253+
operator: Equal
1254+
value: gpu
1255+
effect: NoSchedule
1256+
affinity: {}
1257+
controlPlane:
1258+
resources: {}
1259+
tolerations:
1260+
- effect: NoSchedule
1261+
key: node-role.kubernetes.io/control-plane
1262+
operator: Equal
1263+
value: ""
1264+
affinity: {}
1265+
daemonsets:
1266+
tolerations:
1267+
- key: elastisys.io/node-type
1268+
operator: Equal
1269+
value: gpu
1270+
effect: NoSchedule

config/wc-config.yaml

Lines changed: 0 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -348,43 +348,6 @@ gatekeeper:
348348
# - names:
349349
# - sealedsecrets.bitnami.com
350350
# group: "bitnami.com"
351-
352351
extraServiceAccounts: []
353352
# - namespace: "gatekeeper-system"
354353
# name: "gatekeeper-admin-upgrade-crds"
355-
356-
gpu:
357-
enabled: false
358-
operator:
359-
resources: {}
360-
tolerations: []
361-
affinity: {}
362-
driver:
363-
# use driver version 570.124.06 for supporting cluster running on Ubuntu 24.04.
364-
version: "570.124.06"
365-
env:
366-
- name: NVIDIA_VISIBLE_DEVICES
367-
value: all
368-
nodeFeatureDiscovery:
369-
worker:
370-
resources: {}
371-
tolerations:
372-
- key: elastisys.io/node-type
373-
operator: Equal
374-
value: gpu
375-
effect: NoSchedule
376-
affinity: {}
377-
controlPlane:
378-
resources: {}
379-
tolerations:
380-
- effect: NoSchedule
381-
key: node-role.kubernetes.io/control-plane
382-
operator: Equal
383-
value: ""
384-
affinity: {}
385-
daemonsets:
386-
tolerations:
387-
- key: elastisys.io/node-type
388-
operator: Equal
389-
value: gpu
390-
effect: NoSchedule

helmfile.d/charts/grafana-dashboards/files/welcome.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@
44

55
Here you can find the most relevant features and changes for the last couple of releases of Welkin
66

7+
- Upgrade trivy-operator to v0.26.0 and application to v0.24.0. **[v0.45]**
8+
- Upgraded cert-manager chart to v1.17.1. **[v0.45]**
79
- Upgraded Thanos chart to v15.13.1. **[v0.45]**
810
- Added NVIDIA GPU driver support for Ubuntu 24.04. **[v0.45]**
911
- Added NVIDIA GPU operator to Welkin. **[v0.44]**

helmfile.d/charts/opensearch/configurer/files/dashboards-resources/welcome.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@
44

55
Here you can find the most relevant features and changes for the last couple of releases of Welkin
66

7+
- Upgrade trivy-operator to v0.26.0 and application to v0.24.0. **[v0.45]**
8+
- Upgraded cert-manager chart to v1.17.1. **[v0.45]**
79
- Upgraded Thanos chart to v15.13.1. **[v0.45]**
810
- Added NVIDIA GPU driver support for Ubuntu 24.04. **[v0.45]**
911
- Added NVIDIA GPU operator to Welkin. **[v0.44]**

helmfile.d/values/falco/falco-common.yaml.gotmpl

Lines changed: 32 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -136,11 +136,17 @@ customRules:
136136
# This will be added in a later falco rules version as well
137137
# The fix was added upstream here (with a new condition): https://github.com/falcosecurity/rules/pull/177
138138
- macro: allowed_clear_log_files
139-
condition: (
140-
proc.name=containerd and (
141-
fd.name startswith "/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/" or
142-
fd.name startswith "/var/lib/containerd/tmpmounts/"
143-
))
139+
condition: or (
140+
proc.name = containerd and (
141+
fd.name startswith "/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/" or
142+
fd.name startswith "/var/lib/containerd/tmpmounts/"
143+
)
144+
) or (
145+
proc.name = nvidia-installe and
146+
fd.name startswith "/var/log/nvidia-installer.log"
147+
)
148+
override:
149+
condition: append
144150

145151
# Adding a repository to this list will add an exception to the rules:
146152
# Run shell untrusted
@@ -207,12 +213,32 @@ customRules:
207213
condition: ( container.image.repository in (trusted_image_repositories) )
208214

209215
{{- if and .Values.calicoAccountant.enabled (eq .Values.calicoAccountant.backend "nftables") }}
210-
211216
# Drop and execute new binary in container
212217
- list: known_drop_and_execute_containers
213218
items:
214219
- ghcr.io/elastisys/calico-accountant
220+
override:
221+
items: append
222+
{{- end }}
223+
224+
{{- if .Values.gpu.enabled }}
225+
# Drop and execute new binary in container
226+
- list: known_drop_and_execute_containers
227+
items:
228+
- nvcr.io/nvidia/cloud-native/gpu-operator-validator
229+
- nvcr.io/nvidia/driver
230+
- nvcr.io/nvidia/k8s/dcgm-exporter
231+
override:
232+
items: append
233+
234+
# Linux Kernel Module Injection Detected
235+
- list: allowed_container_images_loading_kernel_module
236+
items:
237+
- nvcr.io/nvidia/driver
238+
override:
239+
items: append
215240
{{- end }}
241+
216242
{{- end }}
217243

218244
{{- if .Values.falco.rulesFiles.incubating.enabled }}

migration/v0.45/README.md

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -134,19 +134,25 @@ As with all scripts in this repository `CK8S_CONFIG_PATH` is expected to be set.
134134
export CK8S_CLUSTER=<wc|sc|both>
135135
```
136136

137-
2. Update gatekeeper CRDs:
137+
1. Upgrade trivy-operator
138+
139+
```bash
140+
./migration/v0.45/apply/10-trivy-operator.sh execute
141+
142+
1. Update gatekeeper CRDs:
138143
139144
```bash
140145
./migration/v0.45/apply/20-gatekeeper-crds.sh execute
141146
```
142147
143-
3. If you didn't have gpu-operator installed previously, but had `gpu-operator` namespace created, label the namespace:
148+
1. If you didn't have gpu-operator installed previously, but had `gpu-operator` namespace created, label the namespace:
149+
144150
```bash
145151
# only for `gpu-operator` installation
146152
./migration/v0.45/apply/40-migrate-gpu-operator-ns.sh execute
147153
```
148154
149-
4. Upgrade applications:
155+
1. Upgrade applications:
150156
151157
```bash
152158
./bin/ck8s apply {sc|wc}

migration/v0.45/apply/40-migrate-gpu-operator-ns.sh

Lines changed: 0 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -5,44 +5,6 @@ ROOT="$(readlink -f "$(dirname "${0}")/../../../")"
55
# shellcheck source=scripts/migration/lib.sh
66
source "${ROOT}/scripts/migration/lib.sh"
77

8-
# functions currently available in the library:
9-
# - logging:
10-
# - log_info(_no_newline) <message>
11-
# - log_warn(_no_newline) <message>
12-
# - log_error(_no_newline) <message>
13-
# - log_fatal <message> # this will call "exit 1"
14-
#
15-
# - kubectl
16-
# # Use kubectl with kubeconfig set
17-
# - kubectl_do <sc|wc> <kubectl args...>
18-
# # Perform kubectl delete, will not cause errors if the resource is missing
19-
# - kubectl_delete <sc|wc> <resource> <namespace> <name>
20-
#
21-
# - helm
22-
# # Use helm with kubeconfig set
23-
# - helm_do <sc|wc> <helm args...>
24-
# # Checks if a release is installed
25-
# - helm_installed <sc|wc> <namespace> <release>
26-
# # Uninstalls a release if it is installed
27-
# - helm_uninstall <sc|wc> <namespace> <release>
28-
#
29-
# - helmfile
30-
# # Use helmfile with kubeconfig set
31-
# - helmfile_do <sc|wc> <helmfile args...>
32-
# # For selector args all will be prefixed with "-l"
33-
# # List releases matching the selector
34-
# - helmfile_list <sc|wc> <selectors...>
35-
# # Apply releases matching the selector
36-
# - helmfile_apply <sc|wc> <selectors...>
37-
# # Check for changes on releases matching the selector
38-
# - helmfile_change <sc|wc> <selectors...>
39-
# # Destroy releases matching the selector
40-
# - helmfile_destroy <sc|wc> <selectors...>
41-
# # Replaces the releases matching the selector, performing destroy and apply on each release individually
42-
# - helmfile_replace <sc|wc> <selectors...>
43-
# # Upgrades the releases matching the selector, performing automatic rollback on failure set "CK8S_ROLLBACK=false" to disable
44-
# - helmfile_upgrade <sc|wc> <selectors...>
45-
468
run() {
479
case "${1:-}" in
4810
execute)

0 commit comments

Comments
 (0)