-
Notifications
You must be signed in to change notification settings - Fork 153
Retry GithubSyncJob if new commit sha is missing in Github response on push from target branch #1417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…n push from target branch
| MAX_RETRY_ATTEMPTS = 5 | ||
| RETRY_DELAY = 5.seconds | ||
| queue_as :default | ||
| on_duplicate :drop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does the on_duplicate :drop work? Will we be able to enqueue the job before the current job is succeeded?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like it applies to perform:
| around_perform { |job, block| job.acquire_lock(&block) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it should be alright since we enqueue the job with a delay and the lock operates on the perform, we don't expect 2 jobs to run at once in this situation.
tjwp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The backoff is not exponential, but I think the retries up to a total of 75 seconds after the initial attempt is a reasonable approach.
| MAX_RETRY_ATTEMPTS = 5 | ||
| RETRY_DELAY = 5.seconds | ||
| queue_as :default | ||
| on_duplicate :drop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like it applies to perform:
| around_perform { |job, block| job.acquire_lock(&block) } |
What you trying to accomplish?
Part of https://github.com/Shopify/deploys/issues/2000
In this PR, we address a critical race condition where commits ready to deploy are missing from the list. This occurrs when GitHub sends push webhooks to Shipit, but GithubSyncJob queries the GitHub API which hasn't yet updated with the new commits. The job will find 0 new commits and complete successfully, leaving deployable commits missing from the system.
We solve this by passing the expected HEAD SHA from the webhook payload through to the sync job, allowing it to detect when the API hasn't caught up yet and retry with exponential backoff.
This ensures that commits are properly synchronized even when GitHub's API experiences eventual consistency delays.