Fix `line_shift` bug in but-hunk-dependency #11368

jonathantanmy2 · 2025-11-26T02:37:49Z

Currently, this PR just has a demonstration of a bug. Once I've confirmed my understanding of the situation, I'll update this PR to contain a fix.

vercel · 2025-11-26T02:37:55Z

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment

Project	Deployment	Preview	Comments	Updated (UTC)
gitbutler-web	Ignored	Preview		Dec 2, 2025 3:18am

Byron · 2025-11-26T04:22:26Z

Thanks @jonathantanmy2, this is incredibly observant! There are probably many bugs and not enough tests, so this definitely is a step in the right direction.

In any case, @mtsgrd might be able to comment on it, maybe showing a path forward as well.

On another note, something I think this needs (at some time) is a better way to test it, and of course, a port to new non-legacy data structures. Ideally, this becomes plumbing that doesn't know workspaces at all, which should also improve its testability.

jonathantanmy2 · 2025-11-27T04:04:14Z

This PR now contains the fix. PTAL. See the commit message for a more complete description of the problem and the fix.

Byron · 2025-11-27T04:21:46Z

I thought I could skip the review if @mtsgrd is going to take a much more proficient look anyway, but I just wanted to welcome our first "Git level" commit message , which I reproduce here for ease of consumption:

The line_shift value is used when combining diff hunks from two or more stacks together, and has an important invariant: at any point where a diff hunk from another stack could be inserted, the cumulative line_shift value must be the net lines (lines added less lines removed) of all the diff hunks prior to that point.

This is so that the combiner knows how to shift the diff hunks. For example, suppose the combiner needs to combine two stacks; the second stack has a diff that adds a line at line 100. Suppose the combined effect of all diff hunks in the first stack prior to line 100 is a net reduction of 42 lines: the total line_shift value of all those diff hunks must thus be -42, so that the combiner knows that the change that the second stack has must be added at line 58 instead of line 100.

Point A: Note that this invariant only needs to apply at any point where a diff hunk from another stack could be inserted. If, in a stack, there are two hunks adjacent to each other, no diff hunk may be inserted between them, so as long as their total line_shift value is correct, they may have any line_shift value they want. (In fact, it is sometimes not possible to determine what each line_shift value should be.)

This invariant is not met by the current algorithm, so I switched the calculation for something that does. The main principles are:

If applying a hunk causes other hunks to completely disappear, the incoming hunk must bear responsibility for the line_shift values of the hunks that disappear by adding their values to itself.

If a hunk splits into two (only possible when applying a hunk in the middle of an existing hunk), the two resulting hunks must split the original line_shift value between them. Due to Point A above, the exact proportion does not matter (the two resulting hunks sandwich the hunk that split them, and all three are adjacent to each other), so I have chosen to distribute the line_shift value based on their sizes.

If applying a hunk causes another hunk to be reduced in size, but not completely disappear, the exact distribution of line_shift values in between these two hunks does not really matter, since they are adjacent (and thus Point A applies). But I have chosen to take some line_shift from the reduced-size hunk to give to the incoming hunk, analogous to how a completely disappearing hunk cedes all its line_shift to the incoming hunk, to make the line_shift values more reasonable (well, reasonable to me, at least).

There is some code duplication due to how the diff hunks of an individual stack are combined. I thought of rewriting the combining algorithm before writing this commit (to reduce or eliminate the code duplication needed), but there were some inconsistencies in how zero-line hunks were handled, so I thought it best to correct the line_shift issue before making further changes.

Thank you @jonathantanmy2 , this really is something I'd like to copy. And maybe one day GitButler will also be the tool that helps to unearth such messages when people are wondering why it is what it is (-> Git archaeology)

krlvi · 2025-11-27T22:00:34Z

(adding Mattias as a reviewer since he authored the original implementation and may remember certain nuances about the functionality)

The `line_shift` value is used when combining diff hunks from two or more stacks together, and has an important invariant: at any point where a diff hunk from another stack could be inserted, the cumulative `line_shift` value must be the net lines (lines added less lines removed) of all the diff hunks prior to that point. This is so that the combiner knows how to shift the diff hunks. For example, suppose the combiner needs to combine two stacks; the second stack has a diff that adds a line at line 100. Suppose the combined effect of all diff hunks in the first stack prior to line 100 is a net reduction of 42 lines: the total `line_shift` value of all those diff hunks must thus be -42, so that the combiner knows that the change that the second stack has must be added at line 58 instead of line 100. Point A: Note that this invariant only needs to apply at any point where a diff hunk from another stack could be inserted. If, in a stack, there are two hunks adjacent to each other, no diff hunk may be inserted between them, so as long as their total `line_shift` value is correct, they may have any `line_shift` value they want. (In fact, it is sometimes not possible to determine what each `line_shift` value should be.) This invariant is not met by the current algorithm, so I switched the calculation for something that does. The main principles are: - If applying a hunk causes other hunks to completely disappear, the incoming hunk must bear responsibility for the `line_shift` values of the hunks that disappear by adding their values to itself. - If a hunk splits into two (only possible when applying a hunk in the middle of an existing hunk), the two resulting hunks must split the original `line_shift` value between them. Due to Point A above, the exact proportion does not matter (the two resulting hunks sandwich the hunk that split them, and all three are adjacent to each other), so I have chosen to distribute the `line_shift` value based on their sizes. - If applying a hunk causes another hunk to be reduced in size, but not completely disappear, the exact distribution of `line_shift` values in between these two hunks does not really matter, since they are adjacent (and thus Point A applies). But I have chosen to take some `line_shift` from the reduced-size hunk to give to the incoming hunk, analogous to how a completely disappearing hunk cedes all its `line_shift` to the incoming hunk, to make the `line_shift` values more reasonable (well, reasonable to me, at least). There is some code duplication due to how the diff hunks of an individual stack are combined. I thought of rewriting the combining algorithm before writing this commit (to reduce or eliminate the code duplication needed), but there were some inconsistencies in how zero-line hunks were handled, so I thought it best to correct the `line_shift` issue before making further changes.

This is a WIP. I need to make another pass to make the identifier names consistent, add documentation, and so on, but uploading this first to check if I'm on the right track. Second most noteworthy is the fix in input_hunk_from_unified_diff. `old_start` has a special meaning when `old_lines` is 0 (same for `new_start` and `new_lines`). Most noteworthy are the bug fixes as can be seen in some tests. In crates/but-hunk-dependency/src/ranges/tests/path.rs: In removing_line_updates_range, the file goes from ? 1 1 to (in commit 1) ? a b c to (in commit 2) ? a c It can be readily seen why the existing results are wrong. In shift_is_correct_after_multiple_changes, the file ends up being 1 2 update 3 add line 1 add line 2 add line 4 4 6 update 7 add line 8 9 10 added lines at the bottom Once again, it can be readily seen why the existing results are wrong. DO NOT SUBMIT There are some insta tests that I blindly accepted the review for that I need to look into. There might be some special handling regarding deletion and recreation of files.

jonathantanmy2 · 2025-12-02T03:20:28Z

I continued looking at the code and drastically simplifying it (note the reduction in lines of code, not only in tests but in production code as well), fixing a few bugs in the process. But it's more involved than I thought - there might be some special handling of file deletion and re-creation. If anyone has any pointers regarding those, feel free to let me know - otherwise I'll look at it myself.

Byron · 2025-12-02T04:28:26Z

OMG! Standing on the sidelines, cheering and clapping!

Unfortunately, that's all the support I can provide with this one.

jonathantanmy2 · 2025-12-02T17:00:23Z

@mtsgrd could you take a look? Regarding the path forward, it may help if someone answers part or all of these questions:

In crates/but-hunk-dependency/src/ranges/hunk.rs intersects, what is the expected behavior if self has zero lines (e.g. if self is the point between lines 5 and 6, does it intersect line 5 and/or line 6)? If the input lines is zero?
To follow up on that, what is the expected usage of this crate? (I looked into it briefly but not thoroughly.) From what I see, it's about gaining the ability to combine two stacks into one (preserving zero-line changes with non-zero line_shift is very important here), and also for "blame" output (we know where every line is from), but I don't know what else.
Regarding combining two stacks into one, is it allowed for one stack to delete and recreate a file and the other stack to modify it? I would personally say that it's disallowed, but the code seems to have support for it (I didn't look closely into what the semantics are, though).

github-actions bot added the rust Pull requests that update Rust code label Nov 26, 2025

jonathantanmy2 mentioned this pull request Nov 26, 2025

line_shift is not recalculated when hunk is split in PathRanges::add #11369

Open

jonathantanmy2 force-pushed the jt/bhd branch 2 times, most recently from 4d96dfd to dbe79bd Compare November 27, 2025 04:03

jonathantanmy2 changed the title ~~Demonstration of line_shift bug in but-hunk-dependency~~ Fix line_shift bug in but-hunk-dependency Nov 27, 2025

krlvi requested a review from mtsgrd November 27, 2025 21:59

jonathantanmy2 added 2 commits December 1, 2025 19:13

jonathantanmy2 force-pushed the jt/bhd branch from 5685230 to 2435e41 Compare December 2, 2025 03:18

jonathantanmy2 mentioned this pull request Dec 5, 2025

Unambiguous uncommitted files CLI ID and cleanup #11463

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix `line_shift` bug in but-hunk-dependency #11368

Fix `line_shift` bug in but-hunk-dependency #11368

Uh oh!

jonathantanmy2 commented Nov 26, 2025

Uh oh!

vercel bot commented Nov 26, 2025 •

edited

Loading

Uh oh!

Byron commented Nov 26, 2025

Uh oh!

jonathantanmy2 commented Nov 27, 2025

Uh oh!

Byron commented Nov 27, 2025

Uh oh!

krlvi commented Nov 27, 2025

Uh oh!

jonathantanmy2 commented Dec 2, 2025

Uh oh!

Byron commented Dec 2, 2025

Uh oh!

jonathantanmy2 commented Dec 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fix line_shift bug in but-hunk-dependency #11368

Are you sure you want to change the base?

Fix line_shift bug in but-hunk-dependency #11368

Uh oh!

Conversation

jonathantanmy2 commented Nov 26, 2025

Uh oh!

vercel bot commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Byron commented Nov 26, 2025

Uh oh!

jonathantanmy2 commented Nov 27, 2025

Uh oh!

Byron commented Nov 27, 2025

Uh oh!

krlvi commented Nov 27, 2025

Uh oh!

jonathantanmy2 commented Dec 2, 2025

Uh oh!

Byron commented Dec 2, 2025

Uh oh!

jonathantanmy2 commented Dec 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fix `line_shift` bug in but-hunk-dependency #11368

Fix `line_shift` bug in but-hunk-dependency #11368

vercel bot commented Nov 26, 2025 •

edited

Loading