Skip to content

[BUG] The heartbeat does not actually check whether a worker is truly alive when renewing its lease. #1089

@lgYanami

Description

@lgYanami

Describe the bug
There appears to be a bug in the system.

The heartbeat does not actually check whether a worker is truly alive when renewing its lease. If a worker crashes and fails to notify the heartbeat via a "finished" channel, the heartbeat will continue to renew the lease for this crashed worker. The lease will only expire when the worker's deadline is reached, triggering retry logic. However, until that point, the lease remains active and is only replaced when a new lease takes over. Before the deadline is reached, this zombie worker will continue to occupy a concurrency slot.

Metadata

Metadata

Labels

bugSomething isn't working

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions