Fix handling of backouts affecting multiple bugs #1933

lutien · 2023-07-26T08:13:11Z

Closes #1904.
From what I've understood, for backouts we already get corresponding wpt commits for a given backout. In the part of code you point out, we get here the commits, and it works in the way.
But I found an issue that when we try to filter out the commits which were backed out, we expect that this commits will be a subset of upstream sync, which is not the case if backout has commits for multiple syncs. That would explain why in this case the bot just applied the backout commit as a normal commit, where it actually should have removed the first commit and marked PR as incomplete. I've observed the same behavior in the test.
I think it might also explain and fix the second case, which we've seen last week, but I didn't manage to write a test which would fail the same way, so can not say for sure.

…ead subset. Since backout can map to multiple bugs, it might not be a subset of upstream PR commits.

Mercurial fails to back out if there're uncommitted changes, that's why we need to shelve changes in between.

jgraham · 2023-07-26T10:43:48Z

So, I think this is correct, but I'm not sure it's sufficient to fix the entire problem.

https://github.com/mozilla/wpt-sync/blob/master/sync/upstream.py#L722 is the main entry point for updating for pushes to gecko repos e.g. autoland.

We call remove_complete_backouts first, but all that does is removes commits that are pushed and backed out in the same set of commits, as processed by the sync bot. But that's actually a rather uncommon case, the more normal thing is that the commit and backout aren't processed at the same time. In that case remove_complete_backouts doesn't do the right thing (because we don't want to process every gecko commit going back indefinitely). Instead if there's a backout in the commit range we're processing, we end up in updates_for_backout and I think the bug in that is the assumption that if it finds a sync that was backed out by a specific backout commit then it must be done with that commit. It doesn't allow for the possibility that a single backout commit can undo multiple syncs.

lutien · 2023-07-26T14:23:58Z

The thing is that remove_complete_backouts runs also in filter_commits, which runs when we access commits. As far as I understood, how it works in the test, which I added: remove_complete_backouts runs with only backout commit, then it goes into updates_for_backout creates correct updates for both syncs, where it adds backed out commit to both of them, then we go here, when we access commits of sync we have an actual commit and backout and if we filter correctly, when we get here, PR gets invalidated.
Now coming back to the issue with updates_for_backout, I guess it all works fine in my test because backout is applied to both syncs in one go, so we don't get the case when the backout commit is already applied to the sync. But true, maybe it can happen, and then we don't need to apply it to this sync again, but we don't want to just exit without checking other syncs, so I've added the commit to continue the loop in this case. Let me know what you think.

jgraham · 2023-07-26T15:10:57Z

sync/upstream.py

-                backed_out_commit_shas = set()
-                return {}, {}
+                # This commit was already processed for this sync
+                continue


This must be wrong, because we'll skip removing the commit from the set of remaining backed_out_commit_shas, which means that we'll end up trying to create a new sync for it below.

We could just remove it from the SHA1s here, but there's still a theoretical (if somewhat unlikely) problem. Let's say that we have a backout B that backs out commits C1 and C2 from different bugs. Now let's say that C1 and C2 already have syncs S1 and S2. But, for whatever reason S1 has been merged upstream, but S2 hasn't. So for S2 I think the current logic is fine; when we update the sync we remove the backed out commits from the upstreamed set and only end up with the net result of removing all the backouts. But for S1, everything is broken; assuming we go through the "create a sync" logic below we'll create a sync that backs out both C1 and C2. Initially that will have a merge conflict (because S2 containing C2 hasn't landed yet), but once S2 lands that new sync will revert both sets of changes, not just S1.

And at this point there's a problem because we're currently defining an UpstreamSync as a commit range and a bug number the commits have to match. When we change the commit range we just move all the wpt-parts of the matching commits to the GitHub PR. But in this case we want to define the sync specifically as the backout of some earlier commit that already landed. Which I don't think is trivial in the current model. I think the sync would need additional data to say specifically which commits it should revert, and then we'd need special handling for sync with a non-empty list of commits that they're reverting. But maybe you can see a simpler solution?

yeah, you're right about the removing from backed_out_commit_shas, I've updated this.
Regarding the scenario when S1 is merged, I guess I'm missing something, but shouldn't S1 then have status complete and because of that excluded from all the backout updates, so it shouldn't come to the "create a sync" point?

It's exactly because it's merged that we get to the create a sync point. The whole "create a new sync for the backout commit" is designed to handle the case where e.g. something is backed out of central after the corresponding upstream sync has already been merged. In that case we can't do anything with the original PR (because GitHub quite rightly doesn't allow adding new commits to a merged PR) so instead we want to create an entirely new sync/PR for the backout.

Ok, I think I'm up to speed with what you're saying :)
Just to reiterate, basically the problem is that in the case you described, we're creating a new sync with only the backout commit, which works fine for one bug backout, but doesn't for multiple bugs, where we should revert only the commits, which belong to the bug, which was merged already.
But so far I don't have a better idea, than, as you said, creating a sync with reverted commits for the bug. So, I guess, I'll try to sketch something.

lutien added 2 commits July 26, 2023 10:08

Check for intersecion of backout commits and upstream PR commits inst…

6c1363a

…ead subset. Since backout can map to multiple bugs, it might not be a subset of upstream PR commits.

Fix fixture for backout to work with multiple revisions.

0988b02

Mercurial fails to back out if there're uncommitted changes, that's why we need to shelve changes in between.

lutien changed the title ~~Fix backout~~ Fix handling of backouts affecting multiple bugs Jul 26, 2023

Add test for backout of multiple bugs.

2a218f7

lutien force-pushed the fix-backout branch from 6b2b37e to 2a218f7 Compare July 26, 2023 10:23

lutien marked this pull request as ready for review July 26, 2023 10:30

lutien requested a review from jgraham July 26, 2023 10:30

jgraham reviewed Jul 26, 2023

View reviewed changes

Don't stop processing backed out commits if it was applied to one sync.

149e735

lutien force-pushed the fix-backout branch from 09bf6ec to 149e735 Compare July 26, 2023 15:35

jgraham self-requested a review November 20, 2024 15:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix handling of backouts affecting multiple bugs #1933

Fix handling of backouts affecting multiple bugs #1933

Uh oh!

lutien commented Jul 26, 2023 •

edited

Loading

Uh oh!

jgraham commented Jul 26, 2023

Uh oh!

lutien commented Jul 26, 2023

Uh oh!

jgraham Jul 26, 2023 •

edited

Loading

Uh oh!

lutien Jul 26, 2023

Uh oh!

jgraham Jul 27, 2023

Uh oh!

lutien Jul 28, 2023

Uh oh!

Uh oh!

Fix handling of backouts affecting multiple bugs #1933

Are you sure you want to change the base?

Fix handling of backouts affecting multiple bugs #1933

Uh oh!

Conversation

lutien commented Jul 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jgraham commented Jul 26, 2023

Uh oh!

lutien commented Jul 26, 2023

Uh oh!

jgraham Jul 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lutien Jul 26, 2023

Choose a reason for hiding this comment

Uh oh!

jgraham Jul 27, 2023

Choose a reason for hiding this comment

Uh oh!

lutien Jul 28, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lutien commented Jul 26, 2023 •

edited

Loading

jgraham Jul 26, 2023 •

edited

Loading