Skip to content

Conversation

@jk464
Copy link
Contributor

@jk464 jk464 commented Feb 13, 2024

Instead of running st2tests as a pod, this runs it as a job.

To ensure the job only runs once, and fails if a test fails, I've added:

      restartPolicy: Never
  backoffLimit: 0

Which means it'll never Back off / or retry.

@pull-request-size pull-request-size bot added the size/L PR that changes 100-499 lines. Requires some effort to review. label Feb 13, 2024
Copy link
Member

@cognifloyd cognifloyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard to see what you've added in the github interface, so I'm highlighting the new sections in my review to help myself see them.

It looks like you want to use the helm tests as a canary script that you run after install or upgrade. Is that right? Are you already using something like this? How well does it work for you?

Comment on lines +23 to +26
imagePullPolicy: {{ $.Values.image.pullPolicy }}
{{- with .Values.securityContext }}
securityContext: {{- toYaml . | nindent 12 }}
{{- end }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is new.

Comment on lines +37 to +40
imagePullPolicy: {{ $.Values.image.pullPolicy }}
{{- with .Values.securityContext }}
securityContext: {{- toYaml . | nindent 12 }}
{{- end }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is new.

Comment on lines +59 to +61
{{- with .Values.securityContext }}
securityContext: {{- toYaml . | nindent 12 }}
{{- end }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is new. (imagePullPolicy was already present)

apiVersion: batch/v1
kind: Job
metadata:
name: "{{ .Release.Name }}-job-st2-tests"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name changed from {{ .Release.Name }}-st2tests

Comment on lines +14 to +16
annotations:
"helm.sh/hook": test-success
"helm.sh/hook-delete-policy": hook-succeeded
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the pod needs these helm annotations, just the top-level Job. Is that right?

configMap:
name: {{ .Release.Name }}-job-st2-tests
restartPolicy: Never
backoffLimit: 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The backoffLimit is new.

kind: ConfigMap
metadata:
name: {{ .Release.Name }}-st2tests
name: {{ .Release.Name }}-job-st2-tests
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might need the helm annotations if you are planning to run this on a real st2 cluster instead of only running it in CI.

I'm not sure the st2tests.sh is designed to run on a real instance of st2--there might be some unintended side effects.

"helm.sh/hook-delete-policy": hook-succeeded
spec:
initContainers:
{{- include "stackstorm-ha.init-containers-wait-for-auth" . | nindent 6 }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a new templated init container you've added to ensure the tests wait until st2 is up and running.

- 'sh'
- '-c'
- >
until curl -skSL --fail -w '\n' -X POST -u {{ .Values.st2.username }}:{{ .Values.st2.password }} "https://{{ required ".Values.ingress.fqdn is required if .Values.ingress.class is non-empty" .Values.ingress.fqdn | printf (ternary "canary-%s" "%s" .Values.phaseCanary)}}/auth/tokens"; do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see these vars in our values file:

  • ingress.fqdn
  • ingress.class
  • phaseCanary

stackstorm-k8s/values.yaml

Lines 298 to 321 in 17e5fca

##
## StackStorm HA Ingress
##
ingress:
# As recommended, ingress is disabled by default.
enabled: false
# Annotations are used to configure the ingress controller
annotations: {}
# kubernetes.io/ingress.class: nginx
# kubernetes.io/tls-acme: "true"
# Map hosts to paths
hosts: []
# - host: hostname.domain.tld
# # Map paths to services
# paths:
# - path: /
# serviceName: service
# servicePort: port
# Secure the Ingress by specifying a secret that contains a TLS private key and certificate
tls: []
# - secretName: chart-example-tls
# hosts:
# - chart-example.test
# ingressClassName: nginx-ingress

I also do not use the ingress, so these would never be defined for me.

user/pass is also only available in my cluster(s) during initial bootstrap. Once I switch to ldap, I exclusively use api tokens within helm.

Do you have any ideas on how to make this more generic?

@cognifloyd cognifloyd added this to the v1.2.0 milestone Apr 13, 2024
@jk464
Copy link
Contributor Author

jk464 commented May 8, 2024

I think we can just drop this PR, I've made a misunderstanding on how these tests work, and that they for use with the helm test command.

Internally we use helm template and then kubectl apply to deploy helm charts to our k8s cluster - this seemingly had the side effect of deploying the tests pod as part of the deploy - which AFAIK is now a mistake on our part and us misusing this pod.

Please let me know if I've yet again misunderstood how this is meant to work...

@jk464 jk464 closed this May 8, 2024
@cognifloyd
Copy link
Member

I think we can just drop this PR, I've made a misunderstanding on how these tests work, and that they for use with the helm test command.

Yeah. helm test is the primary use case for this. Having some kind of canary job might be interesting if you have an idea of what that job should do.

So far, I've added one st2canary job to help catch common issues around packs volumes:

{{- if $.Values.st2.packs.volumes.enabled }}
---
apiVersion: batch/v1
kind: Job
metadata:
name: {{ $.Release.Name }}-job-ensure-packs-volumes-are-writable
labels: {{- include "stackstorm-ha.labels" (list $ "st2canary") | nindent 4 }}
annotations:
helm.sh/hook: pre-install, pre-upgrade, pre-rollback
helm.sh/hook-weight: "-5" # fairly high priority
helm.sh/hook-delete-policy: hook-succeeded
{{- if $.Values.jobs.annotations }}
{{- toYaml $.Values.jobs.annotations | nindent 4 }}
{{- end }}
spec:
template:
metadata:
name: job-st2canary-for-writable-packs-volumes
labels: {{- include "stackstorm-ha.labels" (list $ "st2canary") | nindent 8 }}
annotations:
{{- if $.Values.jobs.annotations }}
{{- toYaml $.Values.jobs.annotations | nindent 8 }}
{{- end }}
spec:
imagePullSecrets:
{{- if $.Values.image.pullSecret }}
- name: {{ $.Values.image.pullSecret }}
{{- end }}
initContainers: []
containers:
- name: st2canary-for-writable-packs-volumes
image: '{{ template "stackstorm-ha.imageRepository" $ }}/st2actionrunner:{{ tpl $.Values.image.tag $ }}'
imagePullPolicy: {{ $.Values.image.pullPolicy }}
{{- with $.Values.securityContext }}
securityContext: {{- toYaml . | nindent 10 }}
{{- end }}
# TODO: maybe use kubectl to assert the volumes have RWX mode
# If volume is a persistentVolumeClaim, then:
# the PVC must only have ReadWriteMany in spec.accessModes
# If volume is something else, then validating through metadata is iffy.
# azureFile, cephfs, csi, glusterfs, nfs, pvc, quobyte, need at least:
# readOnly: false
# ephemeral volumes could also work, ... but that config is even deeper.
command:
- 'bash'
# -e => exit on failure
# -E => trap ERR is inherited in subfunctions
- '-eEc'
- |
cat << 'INTRO'
Testing write permissions for packs volumes.
If this passes, the pod will automatically be deleted.
If this fails, inspect the pod for errors in kubernetes,
and then delete this st2canary pod manually.
INTRO
function __handle_error__ {
cat <<- ' FAIL'
ERROR: One or more volumes in st2.packs.volumes (from helm values) does not meet
StackStorm's shared volumes requirements!
see: https://github.com/StackStorm/stackstorm-k8s#method-2-shared-volumes
HINT: The volumes defined in st2.packs.volumes must use ReadWriteMany (RWX) access mode
so StackStorm can dynamically install packs from any of the st2actionrunner pods
and have those file changes available in all of the other StackStorm pods.
see: https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes
FAIL
}
trap __handle_error__ ERR
for volume in packs virtualenvs {{ if $.Values.st2.packs.volumes.configs }}configs{{ end }}; do
echo Testing write permissions on ${volume} volume...
touch /opt/stackstorm/${volume}/.write-test
rm /opt/stackstorm/${volume}/.write-test
echo
done
echo DONE
volumeMounts:
{{- include "stackstorm-ha.packs-volume-mounts" $ | nindent 8 }}
{{/* do not include the pack-configs-volume-mount helper here */}}
- name: st2-pack-configs-vol
mountPath: /opt/stackstorm/configs/
readOnly: false
# TODO: Find out default resource limits for this specific job (#5)
#resources:
volumes:
{{- include "stackstorm-ha.packs-volumes" $ | nindent 8 }}
{{- if $.Values.st2.packs.volumes.configs }}
{{/* do not include the pack-configs-volume helper here */}}
- name: st2-pack-configs-vol
{{- toYaml $.Values.st2.packs.volumes.configs | nindent 10 }}
{{- end }}
# st2canary job does not support extra_volumes. Let us know if you need this.
restartPolicy: Never
{{- if $.Values.dnsPolicy }}
dnsPolicy: {{ $.Values.dnsPolicy }}
{{- end }}
{{- with $.Values.dnsConfig }}
dnsConfig: {{- toYaml . | nindent 8 }}
{{- end }}
{{- with $.Values.podSecurityContext }}
securityContext: {{- toYaml . | nindent 8 }}
{{- end }}
{{- with $.Values.jobs.nodeSelector }}
nodeSelector: {{- toYaml . | nindent 8 }}
{{- end }}
{{- with $.Values.jobs.affinity }}
affinity: {{- toYaml . | nindent 8 }}
{{- end }}
{{- with $.Values.jobs.tolerations }}
tolerations: {{- toYaml . | nindent 8 }}
{{- end }}
{{- end }}

Any ideas on potential st2canary jobs?

Internally we use helm template and then kubectl apply to deploy helm charts to our k8s cluster - this seemingly had the side effect of deploying the tests pod as part of the deploy - which AFAIK is now a mistake on our part and us misusing this pod.

Ah yes. That's a common gotcha (at least where I work). I've been encouraging my coworkers to use helm install and helm upgrade because doing helm template bypasses a lot of the safety checks helm does during install/upgrade. To me, helm seems a bit more graceful than kubectl apply.

That said, I wonder if there is a way to exclude the helm test hooks when running helm template? Or maybe we can skip rendering the tests stuff in helm template with a new value (or maybe helm exposes some metadata we can use here)? Please submit a PR if there's something the chart can do to avoid the helm template gotcha.

Please let me know if I've yet again misunderstood how this is meant to work...

😄 You've got it now. The idea of canary jobs is very intriguing though...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/L PR that changes 100-499 lines. Requires some effort to review.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants