Skip to content

Commit 70ee03f

Browse files
nabadgernickmintel
andauthored
INFRA:45230: Cronjob fixes for alerting and additional support for backoffLimit (#377)
* INFRA-45230: Support cronjob/job backoffLimit and reduce cronjob/job ttlSecondsAfterFinished setting so alert fires * Fixes for backoffLimit: 0 * Bump chart version --------- Co-authored-by: Nick <nbadger@mintel.com>
1 parent 048b9ec commit 70ee03f

File tree

10 files changed

+722
-16
lines changed

10 files changed

+722
-16
lines changed

charts/standard-application-stack/CHANGELOG.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,12 @@ All notable changes to this project will be documented in this file.
55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

8+
9+
## [v11.1.0] - 2026-01-26
10+
### Changed
11+
- Update the default value of `ttlSecondsAfterFinished` to 5 minutes. This *must* be greater then the related `KubeJobFailed` alert check.
12+
- Added support for custom `backoffLimit` configuration in CronJobs and Jobs. The Kubernetes default (6) is used if not specified. Can be configured via `cronjobs.defaults.backoffLimit`, `cronjobs.jobs[].backoffLimit`, `jobDefaults.backoffLimit`, or per-job in `jobs[].backoffLimit`.
13+
814
## [v11.0.2] - 2026-01-06
915
### Changed
1016
- Updated helm and helm-docs versions so the unittest plugin would work again.

charts/standard-application-stack/Chart.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ type: application
1515
# This is the chart version. This version number should be incremented each time you make changes
1616
# to the chart and its templates, including the app version.
1717
# Versions are expected to follow Semantic Versioning (https://semver.org/)
18-
version: 11.0.2
18+
version: 11.1.0
1919

2020
dependencies:
2121
- name: redis

charts/standard-application-stack/README.md

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# standard-application-stack
22

3-
![Version: 11.0.2](https://img.shields.io/badge/Version-11.0.2-informational?style=flat-square) ![Type: application](https://img.shields.io/badge/Type-application-informational?style=flat-square)
3+
![Version: 11.1.0](https://img.shields.io/badge/Version-11.1.0-informational?style=flat-square) ![Type: application](https://img.shields.io/badge/Type-application-informational?style=flat-square)
44

55
A generic chart to support most common application requirements
66

@@ -65,14 +65,15 @@ A generic chart to support most common application requirements
6565
| celeryBeat.resources.requests | object | `{}` | The requested resources for the container |
6666
| command | list | `["/app/docker-entrypoint.sh"]` | Optional command to the container |
6767
| configMaps | list | `[]` | A list of configuration maps for this application |
68-
| cronjobs | object | `{"defaults":{"concurrencyPolicy":"Forbid","enableDoNotDisrupt":true,"restartPolicy":"Never","suspend":false,"timezone":null,"ttlSecondsAfterFinished":60},"jobs":[]}` | Define and Configure CronJob's Defaults to same image as main deployment but with defined arguments |
69-
| cronjobs.defaults | object | `{"concurrencyPolicy":"Forbid","enableDoNotDisrupt":true,"restartPolicy":"Never","suspend":false,"timezone":null,"ttlSecondsAfterFinished":60}` | Defaults for all CronJob's |
68+
| cronjobs | object | `{"defaults":{"backoffLimit":null,"concurrencyPolicy":"Forbid","enableDoNotDisrupt":true,"restartPolicy":"Never","suspend":false,"timezone":null,"ttlSecondsAfterFinished":600},"jobs":[]}` | Define and Configure CronJob's Defaults to same image as main deployment but with defined arguments |
69+
| cronjobs.defaults | object | `{"backoffLimit":null,"concurrencyPolicy":"Forbid","enableDoNotDisrupt":true,"restartPolicy":"Never","suspend":false,"timezone":null,"ttlSecondsAfterFinished":600}` | Defaults for all CronJob's |
70+
| cronjobs.defaults.backoffLimit | string | `nil` | Specifies the number of retries before marking a job as failed. ref: https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/job-v1/#JobSpec Only set if you want to override the Kubernetes default (6). If not set, Kubernetes default applies. |
7071
| cronjobs.defaults.concurrencyPolicy | string | `"Forbid"` | Tells controller how to handle concurrent executions of a CronJob ref: https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/cron-job-v1/#CronJobSpec |
7172
| cronjobs.defaults.enableDoNotDisrupt | bool | `true` | Whether to set the `karpenter.sh/do-not-disrupt`annotation on the CronJob |
7273
| cronjobs.defaults.restartPolicy | string | `"Never"` | Configure CronJob pod restart Policy ref: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#restart-policy |
7374
| cronjobs.defaults.suspend | bool | `false` | Tells controller to suspend future executions ref: https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/cron-job-v1/#CronJobSpec |
7475
| cronjobs.defaults.timezone | string | `nil` | CronJob schedule will run relative to this timezone. ref: https://en.wikipedia.org/wiki/List_of_tz_database_time_zones |
75-
| cronjobs.defaults.ttlSecondsAfterFinished | int | `60` | If this field is set, ttlSecondsAfterFinished after the Job finishes, it is eligible to be automatically deleted. ref: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.20/#cronjob-v1beta1-batch |
76+
| cronjobs.defaults.ttlSecondsAfterFinished | int | `600` | If this field is set, ttlSecondsAfterFinished after the Job finishes, it is eligible to be automatically deleted. ref: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.20/#cronjob-v1beta1-batch This needs to be greater than the associated KubeJobFailed 'for' value |
7677
| cronjobs.jobs | list | `[]` | List of Cronjob configurations to be defined |
7778
| cronjobsOnly | bool | `false` | Only show Cronjobs and relevant resources (i.e. if set to `true`, hide the main deployment resource) |
7879
| dynamodb.enabled | bool | `false` | |
@@ -166,6 +167,7 @@ A generic chart to support most common application requirements
166167
| jobDefaults.argo.hookDeletePolicy | string | `nil` | When to delete the job resources in an automated fashion ref: https://argo-cd.readthedocs.io/en/stable/user-guide/resource_hooks/#hook-deletion-policies. |
167168
| jobDefaults.argo.syncWave | string | `nil` | Sync Wave in which ArgoCD should apply the manifest. ref: https://argo-cd.readthedocs.io/en/stable/user-guide/sync-waves/. |
168169
| jobDefaults.args | string | `nil` | The command arguments for the main Job container. |
170+
| jobDefaults.backoffLimit | string | `nil` | Specifies the number of retries before marking a job as failed. ref: https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/job-v1/#JobSpec Only set if you want to override the Kubernetes default (6). If not set, Kubernetes default applies. |
169171
| jobDefaults.command | string | `nil` | The command the main Job container will run. |
170172
| jobDefaults.enableDoNotDisrupt | bool | `true` | Whether to set the `karpenter.sh/do-not-disrupt`annotation on the Job |
171173
| jobDefaults.env | list | `[]` | Any env entries you want to add. See includeBaseEnv to add all from main container. ref: https://kubernetes.io/docs/tasks/inject-data-application/define-environment-variable-container/ |
@@ -180,7 +182,7 @@ A generic chart to support most common application requirements
180182
| jobDefaults.podSecurityContext | object | `{}` | Add podSecurityContext config to the Job. |
181183
| jobDefaults.resources | object | `{}` | REQUIRED FOR ALL JOBS. Resource requests/limits. |
182184
| jobDefaults.restartPolicy | string | `"Never"` | Whether the pod should be restarted on failure ref: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#restart-policy) |
183-
| jobDefaults.ttlSecondsAfterFinished | int | `60` | If this field is set, ttlSecondsAfterFinished after the Job finishes, it is eligible to be automatically deleted. ref: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.20/#cronjob-v1beta1-batch |
185+
| jobDefaults.ttlSecondsAfterFinished | int | `600` | If this field is set, ttlSecondsAfterFinished after the Job finishes, it is eligible to be automatically deleted. ref: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.20/#cronjob-v1beta1-batch This needs to be greater than the associated KubeJobFailed 'for' value |
184186
| jobs | list | `[]` | Define and configure jobs Add a map for each job in this list. Refer to `$.Values.jobDefaults` for a list of supported values (and the defaults that will be applied to all jobs below). |
185187
| jobsOnly | bool | `false` | Only show Jobs and relevant resources (i.e. if set to `true`, hide the main deployment resource) |
186188
| kibana.elasticsearchHosts | string | `""` | |

charts/standard-application-stack/templates/cronjobs.yaml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,13 @@ spec:
2626
{{- if (gt $ttl 0) }}
2727
ttlSecondsAfterFinished: {{ $ttl}}
2828
{{- end }}
29+
{{- $backoffLimit := .backoffLimit }}
30+
{{- if eq $backoffLimit nil }}
31+
{{- $backoffLimit = $.Values.cronjobs.defaults.backoffLimit }}
32+
{{- end }}
33+
{{- if ne $backoffLimit nil }}
34+
backoffLimit: {{ $backoffLimit | int }}
35+
{{- end }}
2936
template:
3037
metadata:
3138
labels: {{ include "mintel_common.labels" $data | nindent 12 }}

charts/standard-application-stack/templates/jobs.yaml

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,10 +17,17 @@ metadata:
1717
{{- end }}
1818
namespace: {{ $.Release.Namespace }}
1919
spec:
20-
{{- $ttl := (.ttlSecondsAfterFinished | default 60) | int }}
20+
{{- $ttl := (.ttlSecondsAfterFinished | default 600) | int }}
2121
{{- if (gt $ttl 0) }}
2222
ttlSecondsAfterFinished: {{ $ttl}}
2323
{{- end }}
24+
{{- $backoffLimit := .backoffLimit }}
25+
{{- if eq $backoffLimit nil }}
26+
{{- $backoffLimit = $.Values.jobDefaults.backoffLimit }}
27+
{{- end }}
28+
{{- if ne $backoffLimit nil }}
29+
backoffLimit: {{ $backoffLimit | int }}
30+
{{- end }}
2431
template:
2532
{{- if (.enableDoNotDisrupt) }}
2633
metadata:

0 commit comments

Comments
 (0)