Skip to content

Redis chart: Redis preStop can block pod termination indefinitely if Sentinel terminates early (regression after #35364) #36422

@kapiaszczyk

Description

@kapiaszczyk

Name and Version

bitnami/redis 23.2.12

What steps will reproduce the bug?

  1. Deploy the Bitnami Redis Helm chart with Sentinel enabled in a Kubernetes cluster in a master/replica mode (example: 3-node StatefulSet).
  2. Configure a long termination grace period (e.g. terminationGracePeriodSeconds: 900) so the pod has enough time to execute the preStop logic.
  3. Trigger pod termination, e.g. kubectl rollout restart statefulset <redis-sts-name>
  4. Observe that the Sentinel container terminates quickly, while the Redis container remains stuck in Terminating for the pod with the Redis master.

Are you using any custom parameters or values?

Yes, the following values are relevant:

replica:
  terminationGracePeriodSeconds: 900

What is the expected behavior?

When Kubernetes terminates the pod, Redis should terminate gracefully after executing its preStop hook.

If Sentinel terminates earlier than Redis (which can happen depending on container termination order), Redis should still be able to finish preStop and exit, rather than waiting indefinitely.

Redis should not block termination solely because the Sentinel process is no longer reachable.

What do you see instead?

Redis master can remain stuck in Terminating until terminationGracePeriodSeconds expires.

The Redis preStop hook keeps retrying indefinitely because Sentinel is no longer running and get-master-addr-by-name mymaster returns no output.

The recently introduced check:

if [[ -z "$REDIS_MASTER_HOST" ]]; then
    echo "WARNING: REDIS_MASTER_HOST is empty, assuming failover not finished"
    return 1
fi

treats empty output as “failover not finished”, which causes the retry loop to never complete once Sentinel exits.

While blocked, Redis pauses writes:

CLIENT PAUSE 892000 WRITE

As a result:

  • Redis does not receive SIGTERM until forced termination occurs
  • Pod stays stuck in Terminating
  • Writes are paused for the duration of the grace period

Additional information

This behavior appears to be introduced by: #35364 and affects all versions since.

Before that change, the preStop hook could exit even if Sentinel stopped responding, allowing Redis to terminate.

After that change, Redis termination depends on Sentinel remaining alive throughout the preStop hook, which creates a race condition during pod shutdown (Sentinel may terminate before Redis preStop completes).

Sentinel failover may also not complete gracefully during shutdown

In addition, due to login in the pre-stop script, Sentinel itself can be terminated while it is actively performing a failover (for example after it has selected and promoted a new master, but before it has finished the “reconfigure replicas” and “failover end” stages). If Sentinel receives SIGTERM during this process, the failover may remain incomplete and the new master configuration may not be fully propagated or acknowledged by other Sentinels. This can result in inconsistent cluster state (for example: the promoted node reverting back to replica role, or other Sentinels continuing to believe the old master is still active).

Metadata

Metadata

Assignees

Labels

redissolvedstale15 days without activitytech-issuesThe user has a technical issue about an applicationtriageTriage is needed

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions