Skip to content

Trouble running migrations and stable Postgres with Helm chart using external and internal Postgres options #37

@iamsrirams

Description

@iamsrirams

I am trying to deploy Hatchet using the official Helm charts (hatchet-stack) on Kubernetes with both external Postgres and the built-in internal Postgres options, but I am encountering persistent issues with migrations and database connectivity.

My setup and observations

  • I’m running Kubernetes on EKS in the platform namespace.
  • I tried configuring with external Postgres on AWS RDS with the correct DATABASE_URL, but I see connection refused / DNS resolution errors or network hostname lookup failures.
  • Migration and API pods fail starting with error ERROR: type "v1_readable_status_olap" does not exist (SQLSTATE 42704). This means that database custom types were not created because schema migrations never completed.
  • When switching to internal Postgres (postgresql.enabled: true), the Postgres pod repeatedly suffers OOMKilled pod restarts due to memory limits being too low by default.
  • I attempted to override memory and CPU resources via values.yaml but these changes do not seem to apply; the pod resources do not update and still show minimal resource usage (like 192Mi memory limit).
  • I learned that Bitnami Postgres Helm config requires resource overrides under the postgresql.primary.resources key, but changes still do not apply.
  • The Postgres pod readiness and liveness probes fail frequently while the pod is rebooting or stuck in recovery state.
  • Running migrations manually with the admin CLI container hatchet-admin fails due to database connection refused errors (trying to connect on localhost 127.0.0.1:5431).
  • The migration job environment lacks the proper DATABASE_URL pointing to the internal Postgres service.
  • Manual attempts to run migration pods with the correct DATABASE_URL fail while the Postgres pod is unstable.
  • All attempts to deploy with internal Postgres lead to database pod instability and migration failure.
  • External Postgres connectivity is affected by DNS/network issues inside the cluster routing to RDS.

What I tried and diagnostics

  • Verified service names for internal Postgres: hatchet-stack-postgres.platform.svc.cluster.local
  • Checked Postgres pod logs showing startup, recovery, but eventually OOMKilled due to insufficient memory.
  • Added resource requests and limits up to 4Gi memory and 4 CPUs in values.yaml.
  • Ran Helm upgrade with the updated config and deleted the Postgres pod to force restart, but pod resource limits reset to default minimal values.
  • Tried placing resource overrides under multiple keys: postgres.resources, postgresql.resources, postgresql.primary.resources with no effect.
  • Confirmed pod resource requests using kubectl describe pod always show defaults.
  • Migration and seed jobs fail to create required DB objects causing API and engine pods to crash on startup with missing DB types.
  • Confirmed sharedConfig in Helm values matches official minimal self-hosted config.
  • Tried manual migration commands with CLI container but fail because Postgres pod is unavailable.
  • Encounter DNS resolution errors when targeting external Postgres from inside pods.

Request

I suspect either a bug or missing documentation in the Helm chart that causes:

  • Resource limit overrides for the Postgres subchart to be ignored or improperly applied.
  • Migration jobs not running correctly due to lack of proper DATABASE_URL injection or ordering with the Postgres pod readiness.
  • Difficulty using external Postgres with correct DNS and networking in EKS environments.

Please help me by:

  • Clarifying how to properly override resource limits for Postgres to avoid OOMKilled pod restarts.
  • Confirming correct environment variable keys and values for migration jobs and API to connect to internal/external Postgres.
  • Providing example working values.yaml for both external and internal Postgres setups including migration job config.
  • Suggesting debugging steps for migration job failures and connection issues.
  • Highlighting any known limitations or DNS fixes when using RDS with this Helm chart.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions