Add graceful shutdown support (SIGTERM handling, readiness drop, gRPC GracefulStop) to avoid 500s during upgrades

**### Summary**
We recently switched from Recreate to RollingUpdate (with a ConfigMap checksum annotation), and this successfully removed downtime for new connections. However, existing in‑flight requests still produce a small number of 500 errors during pod termination when running under Kubernetes with an Istio sidecar.

The reason is that the ratelimit binary does not implement graceful shutdown:

- It does not handle SIGTERM
- It does not mark readiness to false when shutting down
- gRPC is closed immediately instead of using GracefulStop()
- In‑flight requests are not drained
- Redis and internal workers stop abruptly

Even with Istio’s terminationDrainDuration and a preStop hook, the process exits too fast and causes 500s.

**### Proposal**
I would like to contribute upstream support for proper graceful shutdown:

- Catch SIGTERM
- Drop readiness immediately
- Use grpc.Server.GracefulStop()
- Allow in‑flight requests to finish (or timeout via env var)
- Close Redis and workers after draining

Are maintainers open to this contribution?

Thanks!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add graceful shutdown support (SIGTERM handling, readiness drop, gRPC GracefulStop) to avoid 500s during upgrades #1068

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add graceful shutdown support (SIGTERM handling, readiness drop, gRPC GracefulStop) to avoid 500s during upgrades #1068

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions