fix(helm) Working helm chart with many enhancements#1184
fix(helm) Working helm chart with many enhancements#1184ikogan wants to merge 12 commits intoCodeWithCJ:mainfrom
Conversation
Add a non-root frontend derivative image, update Helm frontend defaults, and wire the image publish workflows to build the canonical frontend image first and the non-root variant second while preserving Docker Hub and GHCR publishing behavior. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
deployment.yaml: - Add hardcoded wait-for-db init container using pg_isready to prevent the server from CrashLoopBackOffing while the database is still initializing. Uses .Values.postgresql.image.repository/.tag directly to avoid the sparkyfitness.image helper double-prepending global.imageRegistry on already-fully-qualified image paths. Uses sparkyfitness.databaseHost/Port helpers for bundled/external DB compat. Runs as UID 999 (postgres user) with full restricted PSA compliance. - Add extraVolumes, extraVolumeMounts, and initContainers support to the server Deployment template. These allow users to customize the container to for example, inject custom CAs. values.yaml: - Add server.extraVolumes, server.extraVolumeMounts, and server.initContainers stubs to document and enable the now-rendered features above. - Rename server.appDatabase.username from "sparkyapp" to "sparky" to match the role name the application actually creates on first startup. - Rename postgresql.auth.username from "sparky" to "sparky_admin" to separate the DB owner role (used for migrations/schema management) from the limited-privilege runtime role, matching the upstream external-database documentation. - Add postgresql.config block with sensible query logging defaults. - Add postgresql.initdb.scripts block with 02-sparky-admin-setup.sh: reassigns database ownership to the DB owner after the postgres entrypoint creates it as superuser, grants CREATEROLE so the app can create the sparky runtime role on first startup, and installs required PostgreSQL extensions (uuid-ossp, pgcrypto, pg_stat_statements). dbMigrations.ts: - Fix SQL injection risk in CREATE ROLE DDL: escape single quotes in the app user password using standard PostgreSQL string literal doubling before interpolation. DDL statements do not support parameterized placeholders so this is the correct safe approach.
Replace hardcoded codewithcj/ prefix with ${{ secrets.DOCKER_USERNAME || 'codewithcj' }}/
in all four docker/metadata-action image references (frontend, frontend-nonroot,
server, garmin). Falls back to codewithcj when the secret is unset so upstream
workflows are unaffected; forks can override by setting DOCKER_USERNAME.
… script logging and reliability - database-backup cronjob: use postgres superuser (postgres/postgres-password) instead of app DB owner (sparky_admin/user-password); pg_dumpall requires superuser to dump all databases and roles - database-backup cronjob: run script under /bin/bash (shebang was bypassed by /bin/sh, causing pipefail and process substitution syntax errors) - database-backup script: switch to bash + set -euo pipefail so pipeline failures (e.g. pg_dumpall error) actually fail the job - database-backup script: add timestamped logging throughout (start, archive path, completion with size, slot links, retention cleanup counts) - database-backup script: add --verbose to pg_dumpall filtered to per-database progress lines and errors/warnings only (suppresses ACL detail) - helm-release workflow: run helm dependency update before helm package so the postgresql subchart tarball is present in the packaged chart
There was a problem hiding this comment.
Code Review
This pull request introduces a dedicated non-root frontend image and integrates the helmforge/postgresql subchart for bundled database support. It also adds a PVC-backed database backup CronJob with configurable retention and updates the Helm chart to route /api and /uploads traffic directly to the server. Technical feedback identifies a potential Helm rendering error in the wait-for-db init container, missing environment variables in the PostgreSQL initialization script, and filesystem write failures in the frontend entrypoint when running with a read-only root filesystem. Additionally, making the security context of the init container configurable was recommended to ensure compatibility across different environments.
| {{- end }} | ||
| initContainers: | ||
| - name: wait-for-db | ||
| image: {{ .Values.postgresql.image.repository }}:{{ .Values.postgresql.image.tag }} |
There was a problem hiding this comment.
The wait-for-db init container references .Values.postgresql.image, but this key is missing from the parent values.yaml and will be undefined if the postgresql subchart is disabled (e.g., when using an external database). This will cause a Helm rendering error. Consider using the databaseBackup.image values which are defined in the parent chart and use the existing image helper for consistency.
image: {{ include "sparkyfitness.image" (dict "image" .Values.databaseBackup.image "global" .Values.global "appVersion" .Chart.AppVersion) }}| -c "ALTER DATABASE \"${APP_DATABASE}\" OWNER TO \"${APP_USERNAME}\"" | ||
| # Grant CREATEROLE so the app can create the sparky role on first startup. | ||
| psql --username "${POSTGRES_USER}" --dbname postgres \ | ||
| -c "ALTER USER \"${APP_USERNAME}\" CREATEROLE" | ||
| # Install required extensions and grant function execution (must run as superuser). | ||
| psql --username "${POSTGRES_USER}" --dbname "${APP_DATABASE}" \ | ||
| -c 'CREATE EXTENSION IF NOT EXISTS "uuid-ossp"' \ | ||
| -c 'CREATE EXTENSION IF NOT EXISTS "pgcrypto"' \ | ||
| -c 'CREATE EXTENSION IF NOT EXISTS "pg_stat_statements"' \ | ||
| -c "GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA public TO \"${APP_USERNAME}\" WITH GRANT OPTION" |
There was a problem hiding this comment.
The initdb script references environment variables APP_DATABASE and APP_USERNAME which are not standard in the official PostgreSQL image and are not explicitly set in the container environment by this chart. This will cause the script to fail at runtime. Since this is a literal string in values.yaml, you should either hardcode the values to match your defaults or ensure these variables are passed to the PostgreSQL container via extraEnv.
There was a problem hiding this comment.
The upstream chart does indeed use APP_USERNAME and APP_DATABASE: https://github.com/helmforgedev/charts/blob/main/charts/postgresql/templates/statefulset-primary.yaml#L51-L55.
| mkdir -p /var/run/nginx \ | ||
| /var/cache/nginx/client-body \ | ||
| /var/cache/nginx/proxy \ | ||
| /var/cache/nginx/fastcgi \ | ||
| /var/cache/nginx/uwsgi \ | ||
| /var/cache/nginx/scgi \ | ||
| /etc/nginx/conf.d | ||
|
|
||
| cp /etc/nginx/templates/default.conf.template /etc/nginx/conf.d/default.conf |
There was a problem hiding this comment.
This script attempts to mkdir and cp files at runtime. If the container is run with readOnlyRootFilesystem: true (as set in values.yaml), these operations will fail unless the target paths (like /etc/nginx/conf.d and /var/cache/nginx) are backed by emptyDir volumes. The current Helm templates do not appear to include these volume mounts for the frontend component, which will prevent the non-root image from starting in restricted environments.
There was a problem hiding this comment.
The fronted container does indeed use emptyDir volumes for these:
volumeMounts:
- name: tmp
mountPath: /tmp
- name: nginx-cache
mountPath: /var/cache/nginx
- name: nginx-run
mountPath: /var/run
- name: nginx-conf
mountPath: /etc/nginx/conf.d
volumes:
- name: tmp
emptyDir: {}
- name: nginx-cache
emptyDir: {}
- name: nginx-run
emptyDir: {}
- name: nginx-conf
emptyDir: {}Moreover, it is run with readOnlyRootFilesystem: true by default.
| runAsUser: 999 | ||
| runAsGroup: 999 |
There was a problem hiding this comment.
The wait-for-db init container has hardcoded runAsUser and runAsGroup values. This may conflict with pod-level security policies or fail if a different image is used. It is recommended to make these values configurable or align them with the postgresql component's security context to ensure compatibility across different Kubernetes distributions.
… use configurable security context
- wrap wait-for-db in {{- if .Values.postgresql.enabled }} so Helm does not
error when postgresql.image is undefined (external DB mode)
- derive runAsUser/runAsGroup from .Values.postgresql.podSecurityContext
rather than hardcoding 999, for compatibility across different images
and cluster security policies
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a dedicated non-root frontend image and restores the bundled PostgreSQL dependency using helmforge/postgresql (version 18.3). It implements split routing in the Helm chart where Ingress and HTTPRoute handle /api and /uploads traffic directly to the server, and adds a new PVC-backed database backup system with retention policies. Feedback includes using the sparkyfitness.image helper for the wait-for-db init container to support private registries and noting the dependency on GNU date syntax in the backup script to prevent failures if the image is swapped to an Alpine-based variant.
| initContainers: | ||
| {{- if .Values.postgresql.enabled }} | ||
| - name: wait-for-db | ||
| image: {{ .Values.postgresql.image.repository }}:{{ .Values.postgresql.image.tag }} |
There was a problem hiding this comment.
The wait-for-db init container image is constructed manually, which bypasses the global.imageRegistry setting. It is better to use the sparkyfitness.image helper to ensure consistency across the chart, especially for air-gapped or enterprise environments using a private registry.
image: {{ include "sparkyfitness.image" (dict "image" .Values.postgresql.image "global" .Values.global "appVersion" .Chart.AppVersion) | quote }}There was a problem hiding this comment.
Added documentation to explain the rationale: we want this to match the postgres image used by HelmForge, not the image the helper generates.
| date -u -d "-${i} day" +%F | ||
| ;; | ||
| weekly) | ||
| date -u -d "-${i} week" +%G-W%V | ||
| ;; | ||
| monthly) | ||
| date -u -d "$(date -u +%Y-%m-01) -${i} month" +%Y-%m |
There was a problem hiding this comment.
The retention logic relies on GNU date extensions (e.g., -d "-1 day"). While the default postgres:18.3-trixie image is Debian-based and includes GNU coreutils, this script will fail if a user overrides the image with an Alpine-based variant (which uses BusyBox date). Consider adding a note in the documentation or values.yaml that the backup image must support GNU date syntax.
…geRegistry scope - replace GNU date relative-string arithmetic in build_keep_file with epoch-based arithmetic (date -u +%s + shell math) for daily/weekly and awk integer arithmetic for monthly; works on both GNU coreutils and Alpine/BusyBox without any dependency on 'date -d "-N day"' - document in global.imageRegistry and in the wait-for-db init container comment that global.imageRegistry does not apply to the HelmForge postgresql subchart or wait-for-db; both use postgresql.image.repository directly to stay consistent with HelmForge's own image reference style
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a dedicated non-root frontend image running on port 8080 and restores the bundled PostgreSQL dependency using version 18.3. It implements split routing in the Helm chart, directing API and upload traffic directly to the server while the frontend serves static assets. A new PVC-backed database backup CronJob with daily, weekly, and monthly retention logic has also been added. Feedback includes removing redundant proxy settings in the Nginx configuration, addressing portability issues in the backup script's date calculations, and ensuring consistency in image registry handling. Additionally, the backup CronJob requires a writable /tmp volume to support a read-only root filesystem.
| proxy_connect_timeout 60s; | ||
| proxy_send_timeout 300s; | ||
| proxy_read_timeout 300s; |
| while [ "${i}" -lt "${count}" ]; do | ||
| case "${bucket}" in | ||
| daily) | ||
| date -u -d "@$(( now_epoch - i * 86400 ))" +%F |
There was a problem hiding this comment.
The script uses date -d, which is a GNU-specific extension. While the default image (postgres:18.3-trixie) supports it, the script will fail if a user overrides the image to an Alpine-based one (which uses BusyBox date). Consider documenting this dependency or using a more portable approach for date calculations.
| {{- toYaml .Values.databaseBackup.podSecurityContext | nindent 12 }} | ||
| containers: | ||
| - name: pg-dumpall | ||
| image: {{ include "sparkyfitness.image" (dict "image" .Values.databaseBackup.image "global" .Values.global "appVersion" .Chart.AppVersion) | quote }} |
There was a problem hiding this comment.
The image reference here uses the sparkyfitness.image helper, which prepends global.imageRegistry. However, the wait-for-db init container and the bundled PostgreSQL subchart avoid this helper to prevent double-prefixing when the registry is already included in the repository field (as instructed in values.yaml). For consistency and to support private registries correctly, this should match the logic used in the server's wait-for-db init container.
image: "{{ .Values.databaseBackup.image.repository }}:{{ .Values.databaseBackup.image.tag | default .Chart.AppVersion }}"| volumeMounts: | ||
| - name: scripts | ||
| mountPath: /scripts | ||
| - name: backup-storage | ||
| mountPath: {{ .Values.databaseBackup.persistence.mountPath }} | ||
| volumes: | ||
| - name: scripts | ||
| configMap: | ||
| name: {{ include "sparkyfitness.fullname" . }}-database-backup-scripts | ||
| defaultMode: 0755 | ||
| - name: backup-storage | ||
| persistentVolumeClaim: | ||
| claimName: {{ include "sparkyfitness.databaseBackupPvcName" . }} |
There was a problem hiding this comment.
To support readOnlyRootFilesystem: true, an emptyDir volume should be added for /tmp and mounted into the container. The backup script relies on mktemp, which requires a writable temporary directory.
volumeMounts:
- name: scripts
mountPath: /scripts
- name: backup-storage
mountPath: {{ .Values.databaseBackup.persistence.mountPath }}
- name: tmp
mountPath: /tmp
volumes:
- name: scripts
configMap:
name: {{ include "sparkyfitness.fullname" . }}-database-backup-scripts
defaultMode: 0755
- name: backup-storage
persistentVolumeClaim:
claimName: {{ include "sparkyfitness.databaseBackupPvcName" . }}
- name: tmp
emptyDir: {}| runAsUser: 999 | ||
| runAsGroup: 999 | ||
| allowPrivilegeEscalation: false | ||
| readOnlyRootFilesystem: false |
There was a problem hiding this comment.
- Remove redundant proxy_connect/send/read_timeout from nginx.nonroot.conf.template (no proxy_pass in this config, so these had no effect) - Fail with clear message if databaseBackup.enabled=true and postgresql.enabled=false (PVC backups require the bundled subchart; external DB is unsupported) - Replace GNU-only date -d '@epoch' with awk strftime in backup build_keep_file for Alpine/BusyBox compatibility; add export TZ=UTC for portable UTC output - Restore podSecurityContext/containerSecurityContext under databaseBackup in values with readOnlyRootFilesystem: true as default; add /tmp emptyDir volume+mount - Conditional backup image: defaults to postgresql.image.repository:tag (no extra pull, always in sync); when image.repository is set, inherits postgresql.image.tag if image.tag is empty, and applies global.imageRegistry via sparkyfitness.image
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request transitions the frontend to a non-root image, restores the bundled PostgreSQL dependency using the helmforge/postgresql chart, and introduces a PVC-backed database backup system. It also moves API and upload routing from the frontend proxy to the Ingress and HTTPRoute resources. The review identifies a critical issue in the backup script where the use of the GNU-specific awk strftime function will cause failures on Debian-based images; using the date command for timestamp formatting is suggested as a fix.
| while [ "${i}" -lt "${count}" ]; do | ||
| case "${bucket}" in | ||
| daily) | ||
| awk -v e="$(( now_epoch - i * 86400 ))" 'BEGIN { print strftime("%Y-%m-%d", e) }' |
There was a problem hiding this comment.
The strftime function in awk is a GNU extension and is not available in the default mawk interpreter used on Debian-based images (like the postgres:18.3-trixie image used here). This will cause the backup script to fail when calculating retention dates. Since the image is Debian-based, you can use the date command with the @ prefix to format epoch timestamps directly.
date -u -d "@$(( now_epoch - i * 86400 ))" +%Y-%m-%d| awk -v e="$(( now_epoch - i * 86400 ))" 'BEGIN { print strftime("%Y-%m-%d", e) }' | ||
| ;; | ||
| weekly) | ||
| awk -v e="$(( now_epoch - i * 7 * 86400 ))" 'BEGIN { print strftime("%G-W%V", e) }' |
…ates
awk strftime is not available in BusyBox awk (Alpine). Replace the
daily and weekly retention date calculations with portable date(1)
invocations that work across both Debian (GNU coreutils) and Alpine
(BusyBox) postgres images.
- Daily: awk strftime → date -u -d "@$(( now_epoch - i*86400 ))" +%Y-%m-%d
- Weekly: awk strftime → date -u -d "@$(( now_epoch - i*7*86400 ))" +%G-W%V
- Monthly: unchanged — uses pure awk arithmetic, no strftime needed
- Removed: export TZ=UTC workaround (no longer needed)
Tested against both official postgres base images:
postgres:18.3-trixie (Debian)
daily: 2026-04-21 PASS
weekly: 2026-W16 PASS
monthly: 2026-02 PASS
postgres:18.3-alpine
daily: 2026-04-21 PASS
weekly: 2026-W16 PASS
monthly: 2026-02 PASS
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a non-root frontend image and migrates the bundled database dependency to PostgreSQL 18.3 using the helmforge subchart. It also implements a new PVC-backed database backup system with daily, weekly, and monthly retention policies, and shifts API and upload routing from the frontend proxy to the Ingress and Gateway API levels. Feedback focuses on improving the robustness of the backup CronJob templates and ensuring the portability of the backup scripts across different base images.
| image: "{{ .Values.postgresql.image.repository }}:{{ .Values.postgresql.image.tag }}" | ||
| {{- end }} |
There was a problem hiding this comment.
The template relies on .Values.postgresql.image.repository and .Values.postgresql.image.tag being defined. However, these are not explicitly defaulted in the parent chart's values.yaml. If the subchart does not provide these exact keys or if they are missing, the Helm template will fail to render. It is safer to provide defaults or use the default function.
image: "{{ .Values.postgresql.image.repository | default "postgres" }}:{{ .Values.postgresql.image.tag | default "latest" }}"There was a problem hiding this comment.
We don't really know what the upstream chart will want, so we're going to just assume that chart doesn't break.
| --host="${DB_HOST}" \ | ||
| --port="${DB_PORT}" \ | ||
| --username="${DB_USER}" \ | ||
| 2> >(grep -E "^pg_dumpall: (dumping database|running pg_dump)|[Ee]rror|[Ww]arning" >&2) \ |
There was a problem hiding this comment.
The use of process substitution >(...) is a Bash-specific feature. While the shebang correctly specifies #!/bin/bash, ensure that the container image used for the backup job (which defaults to the postgresql image) actually contains bash. Official PostgreSQL images based on Debian do include it, but Alpine-based ones might not.
| date -u -d "@$(( now_epoch - i * 86400 ))" +%Y-%m-%d | ||
| ;; | ||
| weekly) | ||
| date -u -d "@$(( now_epoch - i * 7 * 86400 ))" +%G-W%V |
There was a problem hiding this comment.
The date -d flag with the @epoch syntax is a GNU extension. If the backup container image is changed to one using BusyBox (like Alpine), this script will fail as BusyBox date does not support this syntax. Consider using a more portable way to calculate dates if Alpine support is desired in the future.
There was a problem hiding this comment.
This has been tested on both alpine and debian. Since the alpine image does include bash, we're using that and not busybox.
|
Now that #1178 is merged, @CodeWithCJ, what should we do with this one? I can rebase and remove the PostgreSQL we're managing in 1178, I can rebase and remove the helm chart dependency, using the templates generated by 1178 and adding the backups and other bits, or something else. |
I'm not that experienced with helm. If you believe that using a subchart is better than a template we could just switch to your Implementation |
|
I'll check further on this. I also don't know much on helm chart. Best would be to tag the original creator for this feature. |
|
@haferbeck, I believe you submitted the helm chart first, would you have a sec to weigh in on this? The templates that @Sim-sat submitted are pretty solid but, very understandably, are missing some of the advanced features of the helmforge chart that would be hard to maintain on our own. We're also not preconfiguring liveness and readiness probes or database initialization stuff. |
Yeah charts are better. I only knew about the bitnami one that is not available anymore. I thought that most people would use cloud native pg for a real installation so the database would just be for a quick demo. What about making the current Dockerfile configurable for non root instead of using a new one? Default can be root, so it doesn't brake anything? |
I think that's fine, I'd just need to update the entrypoint to be conditional. The new image exposes port 8080 and switches to user 101:101 but that can just be done at runtime if necessary. So, I'll go ahead and:
And yes, I think many will use cloudnative-pg but there's a good amount of people who prefer the database just be "included", like me for the moment. |
Description
What problem does this PR solve?
The bundled Helm chart does not currently work. It claims to include a PostgreSQL database when in fact, it does not.
This PR fixes the helm chart by depending on the HelmForge PostgreSQL chart and adding additional functionality
to enable actual usage. See below for "new features".
How did you implement the solution?
Several changes needed to be made to make this work:
Add a new subchart, depending on HelmForge PostgreSQL chart. Databases can be complex and managing them with our own templates can introduce future issues and stability challenges. HelmForge supports both standalone and replicated instances, s3 backup, and handles updating PostgreSQL as necessary.
Since HelmForge only supports backup to s3, I added an optional
CronJobto backup to a local PVC so users can have a relatively simple solution to fully backup their database. I'm not sure if the included backup from SparkyFitness would cover everything a complete db backup would. If so, we can remove this and rely on those, but apg_dumpis the most reliable way to ensure the database is backed up.In order to prevent the server from crashing repeatedly on startup while the embedded database starts, I added an
initContainerto wait for the db to become available first so user's don't think something is broke and so that alarms aren't triggered.My clusters do not allow containers to start as root, complying with the the
restrictedPod Security Standard. This means the frontend image cannot be used directly. This PR adds a new nonroot image. I was worried that replacing the existing one might break other users, especially Docker Compose folks so adding a new image seemed best. This image is based on the normal frontend image, only making minor modifications to enable starting as a non-root user.When running in Kubernetes, it is not necessary for the frontend to do any proxying to the server and can prevent some usage patterns. This changes the
Ingressso that the proxying to the server happens at the ingress controller, rather than the frontend container.While working on this PR, CoPilot detected a potential SQL injection that could occur when the password being set in the environment contains quotes; this was patched.
The documentation for SparkyFitness expects the database to be initialized in a few ways before the app starts up to do schema migrations and this includes an initdb script that will do that.
My cluster user a custom root CA for it's certificates; this PR adds support for custom init containers in the server to allow injecting custom root certificates, for example:
Linked Issue: #1080
Unfortunately it looks like #1178 is attempting to solve this too and on the same day so I guess we'll have to see who wins.
How to Test
This was tested on an RKE2 cluster with fairly restricted PodSecurityAdmission standards and Ceph (Rook) backed storage. It should also work in kind or virtually any cluster that supports PVCs.
PR Type
Checklist
All PRs:
Backend changes (
SparkyFitnessServer/):rls_policies.sqlfor any new user-specific tables.