Fix Race Condition in Recursive Job Scheduling (Fixes #294)#327
Draft
Fix Race Condition in Recursive Job Scheduling (Fixes #294)#327
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The Issue (from #294)
When a job with a fixed ID is running and tries to schedule itself again using
enqueue_in:scheduler.enqueue_incallsscheduler._create_job._create_jobcreates a newJobinstance with statusSCHEDULED.STARTED) or just finished (transitioning toFINISHED),_create_jobwould overwrite the job status in Redis toSCHEDULED.FINISHEDupon completion, orFAILED.scheduled_jobsZSET but has a status ofFINISHEDorFAILED, causing the scheduler to ignore or misinterpret it in future runs, effectively "losing" the recurring job.Changes
rq_scheduler/scheduler.py:_create_job: Added a check to see if a job with the same ID already exists. If the existing job is inSTARTEDorQUEUEDstate, we preserve its status and do NOT commit the newSCHEDULEDstatus to Redis. This prevents the scheduler from interfering with the lifecycle of the currently running instance.enqueue_job: Added logic to check if a job is currentlySTARTED. If so, we delay the actual enqueueing (by updating the score in the ZSET) to avoid a race condition where we might enqueue a job that is about to finish.Verification
reproduce_issue.pywas created to simulate the recursive scheduling scenario. Before the fix, the job would stop repeating after a few iterations. After the fix, it runs indefinitely as expected.tests/test_scheduler.pywas updated with regression tests (TestSchedulerRaceCondition) to ensureenqueue_indoes not overwriteSTARTEDorQUEUEDstatus.tests/test_scheduler.pyandtests/test_callbacks.pywere updated to fix regressions related to test assumptions about private attributes and timezone handling.