-
Notifications
You must be signed in to change notification settings - Fork 23
Trouble running migrations and stable Postgres with Helm chart using external and internal Postgres options #37
Copy link
Copy link
Open
Description
I am trying to deploy Hatchet using the official Helm charts (hatchet-stack) on Kubernetes with both external Postgres and the built-in internal Postgres options, but I am encountering persistent issues with migrations and database connectivity.
My setup and observations
- I’m running Kubernetes on EKS in the
platformnamespace. - I tried configuring with external Postgres on AWS RDS with the correct
DATABASE_URL, but I see connection refused / DNS resolution errors or network hostname lookup failures. - Migration and API pods fail starting with error
ERROR: type "v1_readable_status_olap" does not exist (SQLSTATE 42704). This means that database custom types were not created because schema migrations never completed. - When switching to internal Postgres (
postgresql.enabled: true), the Postgres pod repeatedly suffers OOMKilled pod restarts due to memory limits being too low by default. - I attempted to override memory and CPU resources via
values.yamlbut these changes do not seem to apply; the pod resources do not update and still show minimal resource usage (like 192Mi memory limit). - I learned that Bitnami Postgres Helm config requires resource overrides under the
postgresql.primary.resourceskey, but changes still do not apply. - The Postgres pod readiness and liveness probes fail frequently while the pod is rebooting or stuck in recovery state.
- Running migrations manually with the admin CLI container
hatchet-adminfails due to database connection refused errors (trying to connect on localhost 127.0.0.1:5431). - The migration job environment lacks the proper
DATABASE_URLpointing to the internal Postgres service. - Manual attempts to run migration pods with the correct
DATABASE_URLfail while the Postgres pod is unstable. - All attempts to deploy with internal Postgres lead to database pod instability and migration failure.
- External Postgres connectivity is affected by DNS/network issues inside the cluster routing to RDS.
What I tried and diagnostics
- Verified service names for internal Postgres:
hatchet-stack-postgres.platform.svc.cluster.local - Checked Postgres pod logs showing startup, recovery, but eventually OOMKilled due to insufficient memory.
- Added resource requests and limits up to 4Gi memory and 4 CPUs in
values.yaml. - Ran Helm upgrade with the updated config and deleted the Postgres pod to force restart, but pod resource limits reset to default minimal values.
- Tried placing resource overrides under multiple keys:
postgres.resources,postgresql.resources,postgresql.primary.resourceswith no effect. - Confirmed pod resource requests using
kubectl describe podalways show defaults. - Migration and seed jobs fail to create required DB objects causing API and engine pods to crash on startup with missing DB types.
- Confirmed sharedConfig in Helm values matches official minimal self-hosted config.
- Tried manual migration commands with CLI container but fail because Postgres pod is unavailable.
- Encounter DNS resolution errors when targeting external Postgres from inside pods.
Request
I suspect either a bug or missing documentation in the Helm chart that causes:
- Resource limit overrides for the Postgres subchart to be ignored or improperly applied.
- Migration jobs not running correctly due to lack of proper
DATABASE_URLinjection or ordering with the Postgres pod readiness. - Difficulty using external Postgres with correct DNS and networking in EKS environments.
Please help me by:
- Clarifying how to properly override resource limits for Postgres to avoid OOMKilled pod restarts.
- Confirming correct environment variable keys and values for migration jobs and API to connect to internal/external Postgres.
- Providing example working
values.yamlfor both external and internal Postgres setups including migration job config. - Suggesting debugging steps for migration job failures and connection issues.
- Highlighting any known limitations or DNS fixes when using RDS with this Helm chart.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels