Trouble running migrations and stable Postgres with Helm chart using external and internal Postgres options



I am trying to deploy Hatchet using the official Helm charts (`hatchet-stack`) on Kubernetes with both external Postgres and the built-in internal Postgres options, but I am encountering persistent issues with migrations and database connectivity.

## My setup and observations

- I’m running Kubernetes on EKS in the `platform` namespace.
- I tried configuring with external Postgres on AWS RDS with the correct `DATABASE_URL`, but I see connection refused / DNS resolution errors or network hostname lookup failures.
- Migration and API pods fail starting with error `ERROR: type "v1_readable_status_olap" does not exist (SQLSTATE 42704)`. This means that database custom types were not created because schema migrations never completed.
- When switching to internal Postgres (`postgresql.enabled: true`), the Postgres pod repeatedly suffers OOMKilled pod restarts due to memory limits being too low by default.
- I attempted to override memory and CPU resources via `values.yaml` but these changes do not seem to apply; the pod resources do not update and still show minimal resource usage (like 192Mi memory limit).
- I learned that Bitnami Postgres Helm config requires resource overrides under the `postgresql.primary.resources` key, but changes still do not apply.
- The Postgres pod readiness and liveness probes fail frequently while the pod is rebooting or stuck in recovery state.
- Running migrations manually with the admin CLI container `hatchet-admin` fails due to database connection refused errors (trying to connect on localhost 127.0.0.1:5431).
- The migration job environment lacks the proper `DATABASE_URL` pointing to the internal Postgres service.
- Manual attempts to run migration pods with the correct `DATABASE_URL` fail while the Postgres pod is unstable.
- All attempts to deploy with internal Postgres lead to database pod instability and migration failure.
- External Postgres connectivity is affected by DNS/network issues inside the cluster routing to RDS.

## What I tried and diagnostics

- Verified service names for internal Postgres: `hatchet-stack-postgres.platform.svc.cluster.local`
- Checked Postgres pod logs showing startup, recovery, but eventually OOMKilled due to insufficient memory.
- Added resource requests and limits up to 4Gi memory and 4 CPUs in `values.yaml`.
- Ran Helm upgrade with the updated config and deleted the Postgres pod to force restart, but pod resource limits reset to default minimal values.
- Tried placing resource overrides under multiple keys: `postgres.resources`, `postgresql.resources`, `postgresql.primary.resources` with no effect.
- Confirmed pod resource requests using `kubectl describe pod` always show defaults.
- Migration and seed jobs fail to create required DB objects causing API and engine pods to crash on startup with missing DB types.
- Confirmed sharedConfig in Helm values matches official minimal self-hosted config.
- Tried manual migration commands with CLI container but fail because Postgres pod is unavailable.
- Encounter DNS resolution errors when targeting external Postgres from inside pods.

## Request

I suspect either a bug or missing documentation in the Helm chart that causes:

- Resource limit overrides for the Postgres subchart to be ignored or improperly applied.
- Migration jobs not running correctly due to lack of proper `DATABASE_URL` injection or ordering with the Postgres pod readiness.
- Difficulty using external Postgres with correct DNS and networking in EKS environments.

Please help me by:

- Clarifying how to properly override resource limits for Postgres to avoid OOMKilled pod restarts.
- Confirming correct environment variable keys and values for migration jobs and API to connect to internal/external Postgres.
- Providing example working `values.yaml` for both external and internal Postgres setups including migration job config.
- Suggesting debugging steps for migration job failures and connection issues.
- Highlighting any known limitations or DNS fixes when using RDS with this Helm chart.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trouble running migrations and stable Postgres with Helm chart using external and internal Postgres options #37

My setup and observations

What I tried and diagnostics

Request

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Trouble running migrations and stable Postgres with Helm chart using external and internal Postgres options #37

Description

My setup and observations

What I tried and diagnostics

Request

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions