Skip to content

Conversation

@jonathantanmy2
Copy link
Collaborator

Currently, this PR just has a demonstration of a bug. Once I've confirmed my understanding of the situation, I'll update this PR to contain a fix.

@vercel
Copy link

vercel bot commented Nov 26, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Preview Comments Updated (UTC)
gitbutler-web Ignored Ignored Preview Dec 2, 2025 3:18am

@Byron
Copy link
Collaborator

Byron commented Nov 26, 2025

Thanks @jonathantanmy2, this is incredibly observant! There are probably many bugs and not enough tests, so this definitely is a step in the right direction.

In any case, @mtsgrd might be able to comment on it, maybe showing a path forward as well.

On another note, something I think this needs (at some time) is a better way to test it, and of course, a port to new non-legacy data structures. Ideally, this becomes plumbing that doesn't know workspaces at all, which should also improve its testability.

@jonathantanmy2 jonathantanmy2 force-pushed the jt/bhd branch 2 times, most recently from 4d96dfd to dbe79bd Compare November 27, 2025 04:03
@jonathantanmy2 jonathantanmy2 changed the title Demonstration of line_shift bug in but-hunk-dependency Fix line_shift bug in but-hunk-dependency Nov 27, 2025
@jonathantanmy2
Copy link
Collaborator Author

This PR now contains the fix. PTAL. See the commit message for a more complete description of the problem and the fix.

@Byron
Copy link
Collaborator

Byron commented Nov 27, 2025

I thought I could skip the review if @mtsgrd is going to take a much more proficient look anyway, but I just wanted to welcome our first "Git level" commit message , which I reproduce here for ease of consumption:


The line_shift value is used when combining diff hunks from two or more stacks together, and has an important invariant: at any point where a diff hunk from another stack could be inserted, the cumulative line_shift value must be the net lines (lines added less lines removed) of all the diff hunks prior to that point.

This is so that the combiner knows how to shift the diff hunks. For example, suppose the combiner needs to combine two stacks; the second stack has a diff that adds a line at line 100. Suppose the combined effect of all diff hunks in the first stack prior to line 100 is a net reduction of 42 lines: the total line_shift value of all those diff hunks must thus be -42, so that the combiner knows that the change that the second stack has must be added at line 58 instead of line 100.

Point A: Note that this invariant only needs to apply at any point where a diff hunk from another stack could be inserted. If, in a stack, there are two hunks adjacent to each other, no diff hunk may be inserted between them, so as long as their total line_shift value is correct, they may have any line_shift value they want. (In fact, it is sometimes not possible to determine what each line_shift value should be.)

This invariant is not met by the current algorithm, so I switched the calculation for something that does. The main principles are:

  • If applying a hunk causes other hunks to completely disappear, the incoming hunk must bear responsibility for the line_shift values of the hunks that disappear by adding their values to itself.

  • If a hunk splits into two (only possible when applying a hunk in the middle of an existing hunk), the two resulting hunks must split the original line_shift value between them. Due to Point A above, the exact proportion does not matter (the two resulting hunks sandwich the hunk that split them, and all three are adjacent to each other), so I have chosen to distribute the line_shift value based on their sizes.

  • If applying a hunk causes another hunk to be reduced in size, but not completely disappear, the exact distribution of line_shift values in between these two hunks does not really matter, since they are adjacent (and thus Point A applies). But I have chosen to take some line_shift from the reduced-size hunk to give to the incoming hunk, analogous to how a completely disappearing hunk cedes all its line_shift to the incoming hunk, to make the line_shift values more reasonable (well, reasonable to me, at least).

There is some code duplication due to how the diff hunks of an individual stack are combined. I thought of rewriting the combining algorithm before writing this commit (to reduce or eliminate the code duplication needed), but there were some inconsistencies in how zero-line hunks were handled, so I thought it best to correct the line_shift issue before making further changes.


Thank you @jonathantanmy2 , this really is something I'd like to copy. And maybe one day GitButler will also be the tool that helps to unearth such messages when people are wondering why it is what it is (-> Git archaeology)

@krlvi krlvi requested a review from mtsgrd November 27, 2025 21:59
@krlvi
Copy link
Member

krlvi commented Nov 27, 2025

(adding Mattias as a reviewer since he authored the original implementation and may remember certain nuances about the functionality)

The `line_shift` value is used when combining diff hunks from two or
more stacks together, and has an important invariant: at any point
where a diff hunk from another stack could be inserted, the cumulative
`line_shift` value must be the net lines (lines added less lines
removed) of all the diff hunks prior to that point.

This is so that the combiner knows how to shift the diff hunks. For
example, suppose the combiner needs to combine two stacks; the second
stack has a diff that adds a line at line 100. Suppose the combined
effect of all diff hunks in the first stack prior to line 100 is a net
reduction of 42 lines: the total `line_shift` value of all those diff
hunks must thus be -42, so that the combiner knows that the change that
the second stack has must be added at line 58 instead of line 100.

Point A: Note that this invariant only needs to apply at any point
where a diff hunk from another stack could be inserted. If, in a
stack, there are two hunks adjacent to each other, no diff hunk may be
inserted between them, so as long as their total `line_shift` value is
correct, they may have any `line_shift` value they want. (In fact, it is
sometimes not possible to determine what each `line_shift` value should
be.)

This invariant is not met by the current algorithm, so I switched the
calculation for something that does. The main principles are:

 - If applying a hunk causes other hunks to completely disappear, the
   incoming hunk must bear responsibility for the `line_shift` values of
   the hunks that disappear by adding their values to itself.

 - If a hunk splits into two (only possible when applying a hunk in the
   middle of an existing hunk), the two resulting hunks must split the
   original `line_shift` value between them. Due to Point A above, the
   exact proportion does not matter (the two resulting hunks sandwich
   the hunk that split them, and all three are adjacent to each other),
   so I have chosen to distribute the `line_shift` value based on their
   sizes.

 - If applying a hunk causes another hunk to be reduced in size, but
   not completely disappear, the exact distribution of `line_shift`
   values in between these two hunks does not really matter, since they
   are adjacent (and thus Point A applies). But I have chosen to take
   some `line_shift` from the reduced-size hunk to give to the incoming
   hunk, analogous to how a completely disappearing hunk cedes all its
   `line_shift` to the incoming hunk, to make the `line_shift` values
   more reasonable (well, reasonable to me, at least).

There is some code duplication due to how the diff hunks of an
individual stack are combined. I thought of rewriting the combining
algorithm before writing this commit (to reduce or eliminate the
code duplication needed), but there were some inconsistencies in how
zero-line hunks were handled, so I thought it best to correct the
`line_shift` issue before making further changes.
This is a WIP. I need to make another pass to make the identifier names
consistent, add documentation, and so on, but uploading this first to
check if I'm on the right track.

Second most noteworthy is the fix in input_hunk_from_unified_diff.
`old_start` has a special meaning when `old_lines` is 0 (same for
`new_start` and `new_lines`).

Most noteworthy are the bug fixes as can be seen in some tests. In
crates/but-hunk-dependency/src/ranges/tests/path.rs:

In removing_line_updates_range, the file goes from

?
1
1

to (in commit 1)

?
a
b
c

to (in commit 2)

?
a
c

It can be readily seen why the existing results are wrong.

In shift_is_correct_after_multiple_changes, the file ends up being

1
2
update 3
add line 1
add line 2
add line 4
4
6
update 7
add line
8
9
10
added
lines
at the bottom

Once again, it can be readily seen why the existing results are wrong.

DO NOT SUBMIT

There are some insta tests that I blindly accepted the review for that
I need to look into. There might be some special handling regarding
deletion and recreation of files.
@jonathantanmy2
Copy link
Collaborator Author

I continued looking at the code and drastically simplifying it (note the reduction in lines of code, not only in tests but in production code as well), fixing a few bugs in the process. But it's more involved than I thought - there might be some special handling of file deletion and re-creation. If anyone has any pointers regarding those, feel free to let me know - otherwise I'll look at it myself.

@Byron
Copy link
Collaborator

Byron commented Dec 2, 2025

OMG! Standing on the sidelines, cheering and clapping!

Unfortunately, that's all the support I can provide with this one.

@jonathantanmy2
Copy link
Collaborator Author

@mtsgrd could you take a look? Regarding the path forward, it may help if someone answers part or all of these questions:

  • In crates/but-hunk-dependency/src/ranges/hunk.rs intersects, what is the expected behavior if self has zero lines (e.g. if self is the point between lines 5 and 6, does it intersect line 5 and/or line 6)? If the input lines is zero?
  • To follow up on that, what is the expected usage of this crate? (I looked into it briefly but not thoroughly.) From what I see, it's about gaining the ability to combine two stacks into one (preserving zero-line changes with non-zero line_shift is very important here), and also for "blame" output (we know where every line is from), but I don't know what else.
  • Regarding combining two stacks into one, is it allowed for one stack to delete and recreate a file and the other stack to modify it? I would personally say that it's disallowed, but the code seems to have support for it (I didn't look closely into what the semantics are, though).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rust Pull requests that update Rust code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants