Conversation
The current implementation uses a size-1 semaphore to ensure a single worker reads from q.items.Front() at a given time. The problem is that after the worker realizes it has to wait until `plannedToStartWorkAt`, it sleeps and forgets to release the semaphore. Other workers keep waiting until the one holding the semaphore wakes up, runs its task and successfully releases. The q.items list acts as a FIFO queue and the keeps the invariant that elements are ordered by how soon they are to be processed, given the plannedToStartWorkAt value. While a worker sleeps holding the semaphore, elements that should be scheduled immediately cannot be processed even though they might have nothing else to do. When we consider that the default max backoff is 1000s, we can have a worker holding the semaphore for as much as 16 minutes. This patch removes the semaphore and replaces it with a sync.Cond. We'll c.Wait in two situations: there's nothing to do (no elements to process), so we wait until we get a Signal from insert, or we got a element but have to wait for the plannedToStartWorkAt timer. To avoid a timer mutex race, we prefer Signals when we know an item is ready (insert) or the timer fired. Broadcast is used for cancellations and shutdowns. Signed-off-by: Juliana Oliveira <juliana@fly.io>
Member
|
Nice find! |
Signed-off-by: Juliana Oliveira <juliana@fly.io>
Signed-off-by: Juliana Oliveira <juliana@fly.io>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The current implementation uses a size-1 semaphore to ensure a single
worker reads from q.items.Front() at a given time. The problem is that
after the worker realizes it has to wait until
plannedToStartWorkAt,it continues to hold the semaphore while sleeping. Other workers block
until the sleeping worker wakes up, processes its task, and releases.
The q.items list acts as a FIFO queue and the keeps the invariant that
elements are ordered by how soon they are to be processed, given the
plannedToStartWorkAt value. While a worker sleeps holding the semaphore,
elements that should be scheduled immediately cannot be processed even
though they might have nothing else to do. When we consider that the
default max backoff is 1000s, we can have a worker holding the semaphore
for as much as 16 minutes.
This patch removes the semaphore and replaces it with a sync.Cond. We'll
c.Wait in two situations: there's nothing to do (no elements to process),
so we wait until we get a Signal from insert, or we got a element but
have to wait for the plannedToStartWorkAt timer. To avoid a timer mutex
race, we prefer Signals when we know an item is ready (insert) or the
timer fired. Broadcast is used for cancellations and shutdowns.
Signed-off-by: Juliana Oliveira juliana@fly.io