Skip to content

Conversation

@lforst
Copy link

@lforst lforst commented Dec 23, 2025

Adds an option to delay shutdown when receiving SIGTERM. This is useful in certain cloud environments (AWS ECS, Kubernetes) where the load balancer needs time to deregister the service before the process terminates - reducing the risk of stray connections being rejected by PostgREST while load balancing updates propagate.

Config: server-shutdown-wait-period (in seconds, default is 0)
Env: PGRST_SERVER_SHUTDOWN_WAIT_PERIOD
Only affects SIGTERM. SIGINT will still terminate immediately

There seems to be some prior art around this:

@steve-chavez
Copy link
Member

steve-chavez commented Dec 23, 2025

@lforst Looks reasonable to add 👍. This article helped me to understand the problem.

For correcting the CI failures, you can run nix-shell --run postgrest-lint and nix-shell --run postgrest-style when inside the postgREST directory. For the commit style failure, ensure all commits have a prefix (in this case if you do a squash that would solve it).

@steve-chavez
Copy link
Member

We would also need to add the config to the docs here:

server-port
-----------
=============== =================================
**Type** Int
**Default** 3000
**Reloadable** N
**Environment** PGRST_SERVER_PORT
**In-Database** `n/a`
=============== =================================
The TCP port to bind the web server. Use ``0`` to automatically assign a port.
.. _server-trace-header:
server-trace-header
-------------------
=============== =================================
**Type** String
**Default** `n/a`
**Reloadable** Y
**Environment** PGRST_SERVER_TRACE_HEADER
**In-Database** pgrst.server_trace_header
=============== =================================
The header name used to trace HTTP requests. See :ref:`trace_header`.

@wolfgangwalther
Copy link
Member

This article helped me to understand the problem.

My understanding of k8s services, ingresses and nginx-ingress is different and I believe the article is wrong about it. The nginx configuration does not contain the actual different endpoints, it uses the DNS name of the service, which is immediately switched over by kubernetes.

I don't buy the argument, yet, why this is the right way to fix something that seems like it needs a fix on a different level.

Also note that the article itself says:

It's worth noting that we first hit this problem over 3 years ago now, so my understanding and information may be a little out of date. In particular, this section in the documentation implies that this should no longer be a problem! [...]

On a fundamental level, any tool for rolling deployment should be able to verify that the new pod is up and running, that the new pod is successfully routed to and only then start shutting down the old pod. Fixing this in the app is really the wrong place.

@lforst
Copy link
Author

lforst commented Dec 27, 2025

On a fundamental level, any tool for rolling deployment should be able to verify that the new pod is up and running, that the new pod is successfully routed to and only then start shutting down the old pod. Fixing this in the app is really the wrong place.

@wolfgangwalther For what it's worth, I wholeheartedly agree with you. In our setup, we are using AWS ECS and ALB. We explored two potential solutions to the problem at hand: We either add an upstream option (which would be this PR), or we add a whacky script / docker command override to our setup that completely traps the SIGTERM or at least delays it. AWS doesn't provide a better way of configuring the signal. Both solutions are far from optimal. It almost seems like AWS assumes that the behavior of applications after receiving a SIGTERM should be to continue functioning like normal (which makes absolutely no fucking sense to me whatsoever).

In a perfect world, I don't think this option would be necessary. In the face of reality, this option is likely very useful for anybody using PostgREST at large scale with AWS ECS. I am fine with any outcome regarding this PR.

@lforst lforst force-pushed the lforst-sigterm-delay branch from a489a1d to 74d0d43 Compare December 29, 2025 02:38
@steve-chavez
Copy link
Member

In the face of reality, this option is likely very useful for anybody using PostgREST at large scale with AWS ECS.

@wolfgangwalther Do you think we should merge given that this is a common use case for AWS ECS?

Also I'm wondering if this would make more sense if we introduce a "Deployment" section to the docs (we've been asked about this before) and add AWS ECS there and showcase this feature.

@wolfgangwalther
Copy link
Member

There seems to be some prior art around this:

Looking at the 3 provided links, it's clear that this would only solve half the problem: Even if we delay termination, we will still, I believe, immediately exit - and cancel any ongoing requests (not 100% sure whether that's actually true?). We don't have a concept of a graceful shutdown. But to achieve zero-downtime rolling deployments, we also can't drop these requests.

I think the proper thing to do is:

  • Make PostgREST handle different signals differently, aka implement "hard shutdown" and "graceful shutdown".
  • Fix Kubernetes to allow configuration of a delay between "removal of endpoint from services" and "shutting down the pod". I'd be surprised if there wasn't already an upstream issue for it.

@steve-chavez
Copy link
Member

I believe, immediately exit - and cancel any ongoing requests (not 100% sure whether that's actually true?)

Currently when we receive a SIGTERM we don't cancel ongoing requests, we only stop accepting new ones.

Instead of delaying SIGTERM for an unknown X time, perhaps we should have a new SIGTERM mode that keeps accepting new requests and quits once a Y time has passed where no new requests are received? That sounds like it could be the default mode too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants