Skip to content

Add graceful shutdown support (SIGTERM handling, readiness drop, gRPC GracefulStop) to avoid 500s during upgrades #1068

@peresureda

Description

@peresureda

### Summary
We recently switched from Recreate to RollingUpdate (with a ConfigMap checksum annotation), and this successfully removed downtime for new connections. However, existing in‑flight requests still produce a small number of 500 errors during pod termination when running under Kubernetes with an Istio sidecar.

The reason is that the ratelimit binary does not implement graceful shutdown:

  • It does not handle SIGTERM
  • It does not mark readiness to false when shutting down
  • gRPC is closed immediately instead of using GracefulStop()
  • In‑flight requests are not drained
  • Redis and internal workers stop abruptly

Even with Istio’s terminationDrainDuration and a preStop hook, the process exits too fast and causes 500s.

### Proposal
I would like to contribute upstream support for proper graceful shutdown:

  • Catch SIGTERM
  • Drop readiness immediately
  • Use grpc.Server.GracefulStop()
  • Allow in‑flight requests to finish (or timeout via env var)
  • Close Redis and workers after draining

Are maintainers open to this contribution?

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions