Leader election-based distributed timer and image rescan rate limiting

Recently we have replaced our distributed locks and global timers to use etcd's concurrency API to guarantee active-active HA.

However, there are still edge cases that require a global coordination of all manager processes, such as rate-limited container registry access (e.g., Docker Hub with anonymous user).  Since there are many manager processes that receives the API requests in a load-balanced fashion, it is difficult to share the rate-limit states between different manager processes.  This is why lablup/backend.ai-manager#501 is on hold.

Let's localize such globally coordinated states to a single manager process, or a leader.To keep high availability, we should perform periodic checks on the liveness the leader and re-elect it, and fortunately etcd provides the facilities to implement this.

- [x] manager: Implement leader election of manager processes with periodic leader status checks.
- [x] manager: Rewrite global timer to run on the leader manager process. (When a new leader is elected, the new one should start global timers while the old one should stop, of course when the old one is still alive.)
- [ ] manager: Add a generic "leader task" message queue based on Redis stream to reroute API requests accepted by arbitrary manager processes that should be exclusively processed by the leader
- [ ] manager: Rewrite lablup/backend.ai-manager#501 to use a local `aiolimiter` state to implement its own rate-limiting to the container registries.  Use the leader task queue to trigger the rescan task.



JIRA Issue: BA-269

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Leader election-based distributed timer and image rescan rate limiting #415

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Leader election-based distributed timer and image rescan rate limiting #415

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions