Skip to content

Conversation

@Shinomix
Copy link
Contributor

What you trying to accomplish?

Part of https://github.com/Shopify/deploys/issues/2000

In this PR, we address a critical race condition where commits ready to deploy are missing from the list. This occurrs when GitHub sends push webhooks to Shipit, but GithubSyncJob queries the GitHub API which hasn't yet updated with the new commits. The job will find 0 new commits and complete successfully, leaving deployable commits missing from the system.

We solve this by passing the expected HEAD SHA from the webhook payload through to the sync job, allowing it to detect when the API hasn't caught up yet and retry with exponential backoff.

This ensures that commits are properly synchronized even when GitHub's API experiences eventual consistency delays.

@Shinomix Shinomix requested review from a team, hubb, tjwp and yao0204 September 17, 2025 16:05
MAX_RETRY_ATTEMPTS = 5
RETRY_DELAY = 5.seconds
queue_as :default
on_duplicate :drop
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the on_duplicate :drop work? Will we be able to enqueue the job before the current job is succeeded?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like it applies to perform:

around_perform { |job, block| job.acquire_lock(&block) }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it should be alright since we enqueue the job with a delay and the lock operates on the perform, we don't expect 2 jobs to run at once in this situation.

Copy link

@tjwp tjwp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The backoff is not exponential, but I think the retries up to a total of 75 seconds after the initial attempt is a reasonable approach.

MAX_RETRY_ATTEMPTS = 5
RETRY_DELAY = 5.seconds
queue_as :default
on_duplicate :drop
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like it applies to perform:

around_perform { |job, block| job.acquire_lock(&block) }

@Shinomix Shinomix merged commit c26e675 into main Sep 18, 2025
21 of 22 checks passed
@Shinomix Shinomix deleted the retry-github-sync-job branch September 18, 2025 13:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants