Skip to content

Pass in custom logic to Dragonfly health check #5881

@xuekat

Description

@xuekat

Reopen of 5863 (I don't seem to have permission to reopen an issue)

Reply to @romange:

@xuekat what is the behavior that you need during the snapshot loading? the datastore won't accept any reads during that time so how health check solves the issue of downtime?

the right approach (and this is what we do in our cloud service) is to use replication for version updates, to have zero-downtime updates.

We're hoping for health check to only pass once the dragonfly pod has finished loading the dataset into memory, because as the logic currently stands, a dragonfly pod might have passed health check but still not be available as it's loading dataset into memory; so the operator will kill the old pod while the new pod is still not yet able to handle requests.

For now we are remediating this by increasing the number of replicas to be very high so that at least one pod is still alive during the rollout, but this is suboptimal because it increases our infra costs.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions