Support controlling default routes via ansible-init/metadata #602

sjpb · 2025-03-05T15:53:41Z

No description provided.

…d is false (#601) * fix security_group_id logic * toggle secgroups without touching port security * document no_security_groups flag

* add file deletion to cleanup play * bump CI image * add bacin deleted OOD file and fix paths in /etc * bump CI image

sjpb · 2025-03-06T09:25:52Z

Image build: https://github.com/stackhpc/ansible-slurm-appliance/actions/runs/13681461973

sjpb · 2025-03-06T13:32:13Z

Tested that pods work ok:

# daemonset.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: test-node-daemonset
  labels:
    app: test-node
spec:
  selector:
    matchLabels:
      app: test-node
  template:
    metadata:
      labels:
        app: test-node
    spec:
      containers:
      - name: busybox
        image: busybox:latest
        command: ["sleep", "3600"]
        resources:
          requests:
            cpu: "100m"
            memory: "64Mi"
          limits:
            cpu: "200m"
            memory: "128Mi"

Then, what didn't work when I tried this before:

kubectl apply -f daemonset.yml
kubectl get pods -o wide
kubectl exec -it $POD_NAME -- sh

sjpb · 2025-03-06T13:32:57Z

However monitoring is failing with:

TASK [kube_prometheus_stack : Install kube-prometheus-stack on target Kubernetes cluster] ***************************************************************************************************************************************************
Thursday 06 March 2025  12:51:45 +0000 (0:00:06.238)       0:14:59.197 ******** 
fatal: [RL9-control]: FAILED! => {
    "changed": false,
    "command": "/bin/helm --version=59.1.0 --repo=https://prometheus-community.github.io/helm-charts upgrade -i --reset-values --wait --timeout 5m -f=/tmp/tmp1q6qtlkc.yml kube-prometheus-stack kube-prometheus-stack"
}

STDOUT:

Release "kube-prometheus-stack" does not exist. Installing it now.

STDERR:

Error: timed out waiting for the condition

MSG:

Failure when executing Helm command. Exited 1.
stdout: Release "kube-prometheus-stack" does not exist. Installing it now.

stderr: Error: timed out waiting for the condition

sjpb · 2025-03-06T13:33:40Z

Trying manually:

[root@RL9-control ~]# /bin/helm --version=59.1.0 --repo=https://prometheus-community.github.io/helm-charts upgrade -i --reset-values --wait --timeout 5m -f=/tmp/tmp1q6qtlkc.yml kube-prometheus-stack kube-prometheus-stack
Release "kube-prometheus-stack" does not exist. Installing it now.
Error: rendered manifests contain a resource that already exists. Unable to continue with install: ClusterRole "kube-prometheus-stack-grafana-clusterrole" in namespace "" exists and cannot be imported into the current release: invalid ownership metadata; annotation validation error: key "meta.helm.sh/release-namespace" must equal "default": current value is "monitoring-system"

This works, so going thro squid is OK:

[root@RL9-control ~]# curl -L https://prometheus-community.github.io/helm-charts

sjpb · 2025-03-14T17:43:20Z

Replaced by #617. Using k3s with client cluster network configurations requires more work to address grafana trying to contact grafana.com to download the opensearch plugin.

bertiethorpe and others added 8 commits March 5, 2025 15:19

FIX: Tofu attempts to apply security groups when port_security_enable…

0dcf774

…d is false (#601) * fix security_group_id logic * toggle secgroups without touching port security * document no_security_groups flag

Add file deletion to cleanup play (#600)

0e2ec52

* add file deletion to cleanup play * bump CI image * add bacin deleted OOD file and fix paths in /etc * bump CI image

Merge branch 'main' into feature/k3s-monitoring

3219ff4

define desired behaviour

651c722

add gateway ansible-init role

a83eb18

move compute_init playbook

3a4be9a

remove debug values

bd87bf7

support network filters in ansible-init

6eac0e2

sjpb force-pushed the feat/cloudinit-gateways-v3 branch from f785a44 to 6eac0e2 Compare March 5, 2025 17:05

sjpb added 6 commits March 6, 2025 09:27

bump CI image

7f8fe79

support gateway_ip in TF

9fbc90e

fix gateway tag

366232d

get dummy gateway and adding gateway working

a49efd8

fail fast if ansible-init failed

2d715e1

fix chrony for nodes w/o network access (yet)

d577d4b

sjpb added 2 commits March 6, 2025 17:14

configure proxies for k3s too

3b9c6dc

add notes on name resolution to network docs

3e1a047

sjpb mentioned this pull request Mar 7, 2025

Support defining default routes (including dummy ones) for compute nodes via cloud-init #539

Closed

sjpb closed this Mar 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support controlling default routes via ansible-init/metadata #602

Support controlling default routes via ansible-init/metadata #602

Uh oh!

sjpb commented Mar 5, 2025

Uh oh!

sjpb commented Mar 6, 2025

Uh oh!

sjpb commented Mar 6, 2025

Uh oh!

sjpb commented Mar 6, 2025

Uh oh!

sjpb commented Mar 6, 2025 •

edited

Loading

Uh oh!

sjpb commented Mar 14, 2025

Uh oh!

Uh oh!

Support controlling default routes via ansible-init/metadata #602

Support controlling default routes via ansible-init/metadata #602

Uh oh!

Conversation

sjpb commented Mar 5, 2025

Uh oh!

sjpb commented Mar 6, 2025

Uh oh!

sjpb commented Mar 6, 2025

Uh oh!

sjpb commented Mar 6, 2025

Uh oh!

sjpb commented Mar 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sjpb commented Mar 14, 2025

Uh oh!

Uh oh!

sjpb commented Mar 6, 2025 •

edited

Loading