Move st2tests to a job #399

jk464 · 2024-02-13T16:01:31Z

Instead of running st2tests as a pod, this runs it as a job.

To ensure the job only runs once, and fails if a test fails, I've added:

      restartPolicy: Never
  backoffLimit: 0

Which means it'll never Back off / or retry.

cognifloyd

It's hard to see what you've added in the github interface, so I'm highlighting the new sections in my review to help myself see them.

It looks like you want to use the helm tests as a canary script that you run after install or upgrade. Is that right? Are you already using something like this? How well does it work for you?

cognifloyd · 2024-04-11T03:45:16Z

templates/tests/st2tests-job.yaml

+        imagePullPolicy: {{ $.Values.image.pullPolicy }}
+    {{- with .Values.securityContext }}
+        securityContext: {{- toYaml . | nindent 12 }}
+    {{- end }}


This is new.

cognifloyd · 2024-04-11T03:46:38Z

templates/tests/st2tests-job.yaml

+        imagePullPolicy: {{ $.Values.image.pullPolicy }}
+    {{- with .Values.securityContext }}
+        securityContext: {{- toYaml . | nindent 12 }}
+    {{- end }}


This is new.

cognifloyd · 2024-04-11T03:47:18Z

templates/tests/st2tests-job.yaml

+    {{- with .Values.securityContext }}
+        securityContext: {{- toYaml . | nindent 12 }}
+    {{- end }}


This is new. (imagePullPolicy was already present)

cognifloyd · 2024-04-11T03:48:41Z

templates/tests/st2tests-job.yaml

+apiVersion: batch/v1
+kind: Job
+metadata:
+  name: "{{ .Release.Name }}-job-st2-tests"


Name changed from {{ .Release.Name }}-st2tests

cognifloyd · 2024-04-11T03:49:16Z

templates/tests/st2tests-job.yaml

+      annotations:
+        "helm.sh/hook": test-success
+        "helm.sh/hook-delete-policy": hook-succeeded


I don't think the pod needs these helm annotations, just the top-level Job. Is that right?

cognifloyd · 2024-04-11T03:49:48Z

templates/tests/st2tests-job.yaml

+          configMap:
+            name: {{ .Release.Name }}-job-st2-tests
+      restartPolicy: Never
+  backoffLimit: 0


The backoffLimit is new.

cognifloyd · 2024-04-11T03:51:43Z

templates/tests/st2tests-configmap.yaml

 kind: ConfigMap
 metadata:
-  name: {{ .Release.Name }}-st2tests
+  name: {{ .Release.Name }}-job-st2-tests


This might need the helm annotations if you are planning to run this on a real st2 cluster instead of only running it in CI.

I'm not sure the st2tests.sh is designed to run on a real instance of st2--there might be some unintended side effects.

cognifloyd · 2024-04-11T03:52:44Z

templates/tests/st2tests-job.yaml

+        "helm.sh/hook-delete-policy": hook-succeeded
+    spec:
+      initContainers:
+      {{- include "stackstorm-ha.init-containers-wait-for-auth" . | nindent 6 }}


This is a new templated init container you've added to ensure the tests wait until st2 is up and running.

cognifloyd · 2024-04-11T04:02:24Z

templates/_helpers.tpl

+    - 'sh'
+    - '-c'
+    - >
+      until curl -skSL --fail -w '\n'  -X POST -u {{ .Values.st2.username }}:{{ .Values.st2.password }} "https://{{ required ".Values.ingress.fqdn is required if .Values.ingress.class is non-empty" .Values.ingress.fqdn | printf (ternary "canary-%s" "%s" .Values.phaseCanary)}}/auth/tokens"; do


I don't see these vars in our values file:

ingress.fqdn

ingress.class

phaseCanary

stackstorm-k8s/values.yaml

Lines 298 to 321 in 17e5fca

##

## StackStorm HA Ingress

##

ingress:

# As recommended, ingress is disabled by default.

enabled: false

# Annotations are used to configure the ingress controller

annotations: {}

# kubernetes.io/ingress.class: nginx

# kubernetes.io/tls-acme: "true"

# Map hosts to paths

hosts: []

# - host: hostname.domain.tld

# # Map paths to services

# paths:

# - path: /

# serviceName: service

# servicePort: port

# Secure the Ingress by specifying a secret that contains a TLS private key and certificate

tls: []

# - secretName: chart-example-tls

# hosts:

# - chart-example.test

# ingressClassName: nginx-ingress

I also do not use the ingress, so these would never be defined for me.

user/pass is also only available in my cluster(s) during initial bootstrap. Once I switch to ldap, I exclusively use api tokens within helm.

Do you have any ideas on how to make this more generic?

jk464 · 2024-05-08T10:33:31Z

I think we can just drop this PR, I've made a misunderstanding on how these tests work, and that they for use with the helm test command.

Internally we use helm template and then kubectl apply to deploy helm charts to our k8s cluster - this seemingly had the side effect of deploying the tests pod as part of the deploy - which AFAIK is now a mistake on our part and us misusing this pod.

Please let me know if I've yet again misunderstood how this is meant to work...

cognifloyd · 2024-05-08T23:38:36Z

I think we can just drop this PR, I've made a misunderstanding on how these tests work, and that they for use with the helm test command.

Yeah. helm test is the primary use case for this. Having some kind of canary job might be interesting if you have an idea of what that job should do.

So far, I've added one st2canary job to help catch common issues around packs volumes:

stackstorm-k8s/templates/jobs.yaml

Lines 491 to 604 in 93e3f26

    
           {{- if $.Values.st2.packs.volumes.enabled }} 
        
           --- 
        
           apiVersion: batch/v1 
        
           kind: Job 
        
           metadata: 
        
             name: {{ $.Release.Name }}-job-ensure-packs-volumes-are-writable 
        
             labels: {{- include "stackstorm-ha.labels" (list $ "st2canary") | nindent 4 }} 
        
             annotations: 
        
               helm.sh/hook: pre-install, pre-upgrade, pre-rollback 
        
               helm.sh/hook-weight: "-5" # fairly high priority 
        
               helm.sh/hook-delete-policy: hook-succeeded 
        
             {{- if $.Values.jobs.annotations }} 
        
               {{- toYaml $.Values.jobs.annotations | nindent 4 }} 
        
             {{- end }} 
        
           spec: 
        
             template: 
        
               metadata: 
        
                 name: job-st2canary-for-writable-packs-volumes 
        
                 labels: {{- include "stackstorm-ha.labels" (list $ "st2canary") | nindent 8 }} 
        
                 annotations: 
        
                 {{- if $.Values.jobs.annotations }} 
        
                   {{- toYaml $.Values.jobs.annotations | nindent 8 }} 
        
                 {{- end }} 
        
               spec: 
        
                 imagePullSecrets: 
        
                 {{- if $.Values.image.pullSecret }} 
        
                 - name: {{ $.Values.image.pullSecret }} 
        
                 {{- end }} 
        
                 initContainers: [] 
        
                 containers: 
        
                 - name: st2canary-for-writable-packs-volumes 
        
                   image: '{{ template "stackstorm-ha.imageRepository" $ }}/st2actionrunner:{{ tpl $.Values.image.tag $ }}' 
        
                   imagePullPolicy: {{ $.Values.image.pullPolicy }} 
        
                   {{- with $.Values.securityContext }} 
        
                   securityContext: {{- toYaml . | nindent 10 }} 
        
                   {{- end }} 
        
                   # TODO: maybe use kubectl to assert the volumes have RWX mode 
        
                   # If volume is a persistentVolumeClaim, then: 
        
                   #   the PVC must only have ReadWriteMany in spec.accessModes 
        
                   # If volume is something else, then validating through metadata is iffy. 
        
                   #   azureFile, cephfs, csi, glusterfs, nfs, pvc, quobyte, need at least: 
        
                   #     readOnly: false 
        
                   #   ephemeral volumes could also work, ... but that config is even deeper. 
        
                   command: 
        
                     - 'bash' 
        
                     # -e => exit on failure 
        
                     # -E => trap ERR is inherited in subfunctions 
        
                     - '-eEc' 
        
                     - | 
        
                       cat << 'INTRO' 
        
                       Testing write permissions for packs volumes. 
        
                       If this passes, the pod will automatically be deleted. 
        
                       If this fails, inspect the pod for errors in kubernetes, 
        
                       and then delete this st2canary pod manually. 
        
                       INTRO 
        
                       function __handle_error__ { 
        
                         cat <<- '  FAIL' 
        
                         ERROR: One or more volumes in st2.packs.volumes (from helm values) does not meet 
        
                         StackStorm's shared volumes requirements! 
        
                         see: https://github.com/StackStorm/stackstorm-k8s#method-2-shared-volumes 
        
                         HINT: The volumes defined in st2.packs.volumes must use ReadWriteMany (RWX) access mode 
        
                         so StackStorm can dynamically install packs from any of the st2actionrunner pods 
        
                         and have those file changes available in all of the other StackStorm pods. 
        
                         see: https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes 
        
                         FAIL 
        
                       } 
        
                       trap __handle_error__ ERR 
        
                       for volume in packs virtualenvs {{ if $.Values.st2.packs.volumes.configs }}configs{{ end }}; do 
        
                         echo Testing write permissions on ${volume} volume... 
        
                         touch /opt/stackstorm/${volume}/.write-test 
        
                         rm /opt/stackstorm/${volume}/.write-test 
        
                         echo 
        
                       done 
        
                       echo DONE 
        
                   volumeMounts: 
        
                   {{- include "stackstorm-ha.packs-volume-mounts" $ | nindent 8 }} 
        
                   {{/* do not include the pack-configs-volume-mount helper here */}} 
        
                   - name: st2-pack-configs-vol 
        
                     mountPath: /opt/stackstorm/configs/ 
        
                     readOnly: false 
        
                   # TODO: Find out default resource limits for this specific job (#5) 
        
                   #resources: 
        
                 volumes: 
        
                   {{- include "stackstorm-ha.packs-volumes" $ | nindent 8 }} 
        
                   {{- if $.Values.st2.packs.volumes.configs }} 
        
                     {{/* do not include the pack-configs-volume helper here */}} 
        
                   - name: st2-pack-configs-vol 
        
                     {{- toYaml $.Values.st2.packs.volumes.configs | nindent 10 }} 
        
                   {{- end }} 
        
                   # st2canary job does not support extra_volumes. Let us know if you need this. 
        
                 restartPolicy: Never 
        
               {{- if $.Values.dnsPolicy }} 
        
                 dnsPolicy: {{ $.Values.dnsPolicy }} 
        
               {{- end }} 
        
               {{- with $.Values.dnsConfig }} 
        
                 dnsConfig: {{- toYaml . | nindent 8 }} 
        
               {{- end }} 
        
               {{- with $.Values.podSecurityContext }} 
        
                 securityContext: {{- toYaml . | nindent 8 }} 
        
               {{- end }} 
        
               {{- with $.Values.jobs.nodeSelector }} 
        
                 nodeSelector: {{- toYaml . | nindent 8 }} 
        
               {{- end }} 
        
               {{- with $.Values.jobs.affinity }} 
        
                 affinity: {{- toYaml . | nindent 8 }} 
        
               {{- end }} 
        
               {{- with $.Values.jobs.tolerations }} 
        
                 tolerations: {{- toYaml . | nindent 8 }} 
        
               {{- end }} 
        
           {{- end }}

Any ideas on potential st2canary jobs?

Internally we use helm template and then kubectl apply to deploy helm charts to our k8s cluster - this seemingly had the side effect of deploying the tests pod as part of the deploy - which AFAIK is now a mistake on our part and us misusing this pod.

Ah yes. That's a common gotcha (at least where I work). I've been encouraging my coworkers to use helm install and helm upgrade because doing helm template bypasses a lot of the safety checks helm does during install/upgrade. To me, helm seems a bit more graceful than kubectl apply.

That said, I wonder if there is a way to exclude the helm test hooks when running helm template? Or maybe we can skip rendering the tests stuff in helm template with a new value (or maybe helm exposes some metadata we can use here)? Please submit a PR if there's something the chart can do to avoid the helm template gotcha.

Please let me know if I've yet again misunderstood how this is meant to work...

😄 You've got it now. The idea of canary jobs is very intriguing though...

Move st2tests to a job

cbbecbb

pull-request-size bot added the size/L PR that changes 100-499 lines. Requires some effort to review. label Feb 13, 2024

nzlosh mentioned this pull request Feb 13, 2024

TSC Meeting (13 Feb 2024) StackStorm/community#131

Closed

cognifloyd requested changes Apr 11, 2024

View reviewed changes

cognifloyd added this to the v1.2.0 milestone Apr 13, 2024

jk464 closed this May 8, 2024

	##
	## StackStorm HA Ingress
	##
	ingress:
	# As recommended, ingress is disabled by default.
	enabled: false
	# Annotations are used to configure the ingress controller
	annotations: {}
	# kubernetes.io/ingress.class: nginx
	# kubernetes.io/tls-acme: "true"
	# Map hosts to paths
	hosts: []
	# - host: hostname.domain.tld
	# # Map paths to services
	# paths:
	# - path: /
	# serviceName: service
	# servicePort: port
	# Secure the Ingress by specifying a secret that contains a TLS private key and certificate
	tls: []
	# - secretName: chart-example-tls
	# hosts:
	# - chart-example.test
	# ingressClassName: nginx-ingress

Uh oh!

Move st2tests to a job #399

Move st2tests to a job #399

Conversation

jk464 commented Feb 13, 2024

Uh oh!

cognifloyd left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jk464 commented May 8, 2024

Uh oh!

cognifloyd commented May 8, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants