-
Notifications
You must be signed in to change notification settings - Fork 57
Closed
Labels
A-JobsRelated to asynchronous jobsRelated to asynchronous jobs
Description
This issue was originally created by @sandhose at matrix-org/matrix-authentication-service#2785.
We're currently using apalis for async jobs, which… is probably not mature enough.
A few issues with it:
- the crates are a bit messy
- they are doing breaking changes all the time. See Upgrade apalis #2784
- the database structure is… weird
- the jobs don't retry correctly
- the "workers" crash when it looses the connection and don't restart
- sharing state in the job is not type safe
Requirements:
- have jobs in the database
- use triggers to NOTIFY workers for new jobs (optional?)
- lock rows with PG's
FOR UPDATE SKIP LOCKED
- pass the connection which locked the task row to the handler
- have a way to do cron-like schedules for maintenance tasks
Related:
- Losing the connection to Postgres stops the background jobs matrix-org/matrix-authentication-service#2625
- Ensure background jobs are reliably retrying matrix-org/matrix-authentication-service#1490
### Pull requests
- [ ] #3307
- [ ] https://github.com/element-hq/matrix-authentication-service/pull/3367
- [ ] https://github.com/element-hq/matrix-authentication-service/pull/3455
- [ ] https://github.com/element-hq/matrix-authentication-service/pull/3558
- [ ] https://github.com/element-hq/matrix-authentication-service/pull/3678
### Tasks
- [x] Perform leader election for maintenance tasks
- [x] Shutdown 'lost' workers
- [x] Insert jobs in the database
- [x] Consume jobs from the database
- [x] Retry failed jobs
- [x] Gracefully shutdown workers
- [x] Use a single connection to LISTEN and interact with jobs
- [x] Schedule jobs in the future
- [ ] Re-schedule jobs from lost workers
- [x] Cron-like schedules
- [x] Metrics
- [x] Tracing with span links
Metadata
Metadata
Assignees
Labels
A-JobsRelated to asynchronous jobsRelated to asynchronous jobs