Skip to content

Conversation

lutien
Copy link
Collaborator

@lutien lutien commented Jul 26, 2023

Closes #1904.
From what I've understood, for backouts we already get corresponding wpt commits for a given backout. In the part of code you point out, we get here the commits, and it works in the way.
But I found an issue that when we try to filter out the commits which were backed out, we expect that this commits will be a subset of upstream sync, which is not the case if backout has commits for multiple syncs. That would explain why in this case the bot just applied the backout commit as a normal commit, where it actually should have removed the first commit and marked PR as incomplete. I've observed the same behavior in the test.
I think it might also explain and fix the second case, which we've seen last week, but I didn't manage to write a test which would fail the same way, so can not say for sure.

lutien added 2 commits July 26, 2023 10:08
…ead subset.

Since backout can map to multiple bugs, it might not be a subset of upstream PR commits.
Mercurial fails to back out if there're uncommitted changes, that's why we need to shelve changes in between.
@lutien lutien changed the title Fix backout Fix handling of backouts affecting multiple bugs Jul 26, 2023
@lutien lutien marked this pull request as ready for review July 26, 2023 10:30
@lutien lutien requested a review from jgraham July 26, 2023 10:30
@jgraham
Copy link
Member

jgraham commented Jul 26, 2023

So, I think this is correct, but I'm not sure it's sufficient to fix the entire problem.

https://github.com/mozilla/wpt-sync/blob/master/sync/upstream.py#L722 is the main entry point for updating for pushes to gecko repos e.g. autoland.

We call remove_complete_backouts first, but all that does is removes commits that are pushed and backed out in the same set of commits, as processed by the sync bot. But that's actually a rather uncommon case, the more normal thing is that the commit and backout aren't processed at the same time. In that case remove_complete_backouts doesn't do the right thing (because we don't want to process every gecko commit going back indefinitely). Instead if there's a backout in the commit range we're processing, we end up in updates_for_backout and I think the bug in that is the assumption that if it finds a sync that was backed out by a specific backout commit then it must be done with that commit. It doesn't allow for the possibility that a single backout commit can undo multiple syncs.

@lutien
Copy link
Collaborator Author

lutien commented Jul 26, 2023

The thing is that remove_complete_backouts runs also in filter_commits, which runs when we access commits. As far as I understood, how it works in the test, which I added: remove_complete_backouts runs with only backout commit, then it goes into updates_for_backout creates correct updates for both syncs, where it adds backed out commit to both of them, then we go here, when we access commits of sync we have an actual commit and backout and if we filter correctly, when we get here, PR gets invalidated.
Now coming back to the issue with updates_for_backout, I guess it all works fine in my test because backout is applied to both syncs in one go, so we don't get the case when the backout commit is already applied to the sync. But true, maybe it can happen, and then we don't need to apply it to this sync again, but we don't want to just exit without checking other syncs, so I've added the commit to continue the loop in this case. Let me know what you think.

backed_out_commit_shas = set()
return {}, {}
# This commit was already processed for this sync
continue
Copy link
Member

@jgraham jgraham Jul 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This must be wrong, because we'll skip removing the commit from the set of remaining backed_out_commit_shas, which means that we'll end up trying to create a new sync for it below.

We could just remove it from the SHA1s here, but there's still a theoretical (if somewhat unlikely) problem. Let's say that we have a backout B that backs out commits C1 and C2 from different bugs. Now let's say that C1 and C2 already have syncs S1 and S2. But, for whatever reason S1 has been merged upstream, but S2 hasn't. So for S2 I think the current logic is fine; when we update the sync we remove the backed out commits from the upstreamed set and only end up with the net result of removing all the backouts. But for S1, everything is broken; assuming we go through the "create a sync" logic below we'll create a sync that backs out both C1 and C2. Initially that will have a merge conflict (because S2 containing C2 hasn't landed yet), but once S2 lands that new sync will revert both sets of changes, not just S1.

And at this point there's a problem because we're currently defining an UpstreamSync as a commit range and a bug number the commits have to match. When we change the commit range we just move all the wpt-parts of the matching commits to the GitHub PR. But in this case we want to define the sync specifically as the backout of some earlier commit that already landed. Which I don't think is trivial in the current model. I think the sync would need additional data to say specifically which commits it should revert, and then we'd need special handling for sync with a non-empty list of commits that they're reverting. But maybe you can see a simpler solution?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, you're right about the removing from backed_out_commit_shas, I've updated this.
Regarding the scenario when S1 is merged, I guess I'm missing something, but shouldn't S1 then have status complete and because of that excluded from all the backout updates, so it shouldn't come to the "create a sync" point?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's exactly because it's merged that we get to the create a sync point. The whole "create a new sync for the backout commit" is designed to handle the case where e.g. something is backed out of central after the corresponding upstream sync has already been merged. In that case we can't do anything with the original PR (because GitHub quite rightly doesn't allow adding new commits to a merged PR) so instead we want to create an entirely new sync/PR for the backout.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think I'm up to speed with what you're saying :)
Just to reiterate, basically the problem is that in the case you described, we're creating a new sync with only the backout commit, which works fine for one bug backout, but doesn't for multiple bugs, where we should revert only the commits, which belong to the bug, which was merged already.
But so far I don't have a better idea, than, as you said, creating a sync with reverted commits for the bug. So, I guess, I'll try to sketch something.

@jgraham jgraham self-requested a review November 20, 2024 15:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Broken handling of backouts affecting multiple bugs
2 participants