-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
add: add option to delay shutdown initiation after receiving a SIGTERM
#4580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@lforst Looks reasonable to add 👍. This article helped me to understand the problem. For correcting the CI failures, you can run |
|
We would also need to add the config to the docs here: postgrest/docs/references/configuration.rst Lines 847 to 873 in aa4199f
|
My understanding of k8s services, ingresses and nginx-ingress is different and I believe the article is wrong about it. The nginx configuration does not contain the actual different endpoints, it uses the DNS name of the service, which is immediately switched over by kubernetes. I don't buy the argument, yet, why this is the right way to fix something that seems like it needs a fix on a different level. Also note that the article itself says:
On a fundamental level, any tool for rolling deployment should be able to verify that the new pod is up and running, that the new pod is successfully routed to and only then start shutting down the old pod. Fixing this in the app is really the wrong place. |
@wolfgangwalther For what it's worth, I wholeheartedly agree with you. In our setup, we are using AWS ECS and ALB. We explored two potential solutions to the problem at hand: We either add an upstream option (which would be this PR), or we add a whacky script / docker command override to our setup that completely traps the In a perfect world, I don't think this option would be necessary. In the face of reality, this option is likely very useful for anybody using PostgREST at large scale with AWS ECS. I am fine with any outcome regarding this PR. |
a489a1d to
74d0d43
Compare
@wolfgangwalther Do you think we should merge given that this is a common use case for AWS ECS? Also I'm wondering if this would make more sense if we introduce a "Deployment" section to the docs (we've been asked about this before) and add AWS ECS there and showcase this feature. |
Looking at the 3 provided links, it's clear that this would only solve half the problem: Even if we delay termination, we will still, I believe, immediately exit - and cancel any ongoing requests (not 100% sure whether that's actually true?). We don't have a concept of a graceful shutdown. But to achieve zero-downtime rolling deployments, we also can't drop these requests. I think the proper thing to do is:
|
Currently when we receive a SIGTERM we don't cancel ongoing requests, we only stop accepting new ones. Instead of delaying SIGTERM for an unknown X time, perhaps we should have a new SIGTERM mode that keeps accepting new requests and quits once a Y time has passed where no new requests are received? That sounds like it could be the default mode too. |
Adds an option to delay shutdown when receiving SIGTERM. This is useful in certain cloud environments (AWS ECS, Kubernetes) where the load balancer needs time to deregister the service before the process terminates - reducing the risk of stray connections being rejected by
PostgRESTwhile load balancing updates propagate.Config:
server-shutdown-wait-period(in seconds, default is 0)Env:
PGRST_SERVER_SHUTDOWN_WAIT_PERIODOnly affects
SIGTERM.SIGINTwill still terminate immediatelyThere seems to be some prior art around this:
--wait-before-secondsoption: https://linkerd.io/2-edge/tasks/graceful-shutdown/