Skip to content

Implement health checks for the ColdFront workers #134

@knikolla

Description

@knikolla

Currently, once or twice a year the 1 qcluster worker disconnects from Redis and is unable to recover itself. This causes scheduled jobs to be stuck waiting indefinitely in the queue until the worker pod is restarted.

A health check needs to be implemented that checks the status of the pods and performs the pod delete if it's stuck.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions