Skip to content

Conversation

cruessler
Copy link
Contributor

@cruessler cruessler commented Sep 28, 2025

This PR is about 75–80 % done (hopefully). It still needs comments and there’s also a couple of TODO comments that I want to go through. I already wanted to open it at this early stage, kind of as a sneak preview of the things I’m currently working on. :-)

What’s the context and what are the goals of this PR?

We assume that most people would expect the output of gix-diff to be identical to that of git diff. Therefore, we want an easy way to compare the output of gix-diff and git diff on a large corpus of diffs to make sure they match.

This PR tries to make as much of that process scripted and as easily reproducible as possible.

There is a repository that contains “[t]ools for experimenting [with] diff "slider" heuristics”: diff-slider-tools. We can use diff-slider-tools to, for any git repository, generate a list of locations where a diff between two files is not unambiguous.

This PR creates a tool that takes this a list of locations generated by diff-slider-tools and turns it into test cases that can be run as part of gix-diff’s test suite.

This enables us to, whenever we want, run large-scale tests comparing gix-diff to gix diff, hopefully uncovering any edge case we might have missed in our slider heuristic and making sure we conform to our users’ expectations.

Usage

  1. Follow these instructions to generate a file containing sliders: https://github.com/mhagger/diff-slider-tools/blob/b59ed13d7a2a6cfe14a8f79d434b6221cc8b04dd/README.md?plain=1#L122-L146
  2. Run create-diff-cases to create the script in gix-diff/tests/fixtures/. The script which will be called make_diff_for_sliders_repo.sh.
    # run inside `gitoxide`
    cargo run --package internal-tools -- create-diff-cases --sliders-file $DIFF_SLIDER_TOOLS/corpus/git.sliders --worktree-dir $DIFF_SLIDER_TOOLS/corpus/git.git/ --destination-dir gix-diff/tests/fixtures/
    
  3. Run cargo test sliders -- --nocapture inside gix-diff/tests to run the actual tests.

Implementation details

The preamble to make_diff_for_sliders_repo.sh comes from a similar script in gix-diff: https://github.com/cruessler/gitoxide/blob/f0ceafa62ff519484e016789c50fa49a79819a80/gix/tests/fixtures/make_diff_repos.sh#L1-L4.

The repo created by make_diff_for_sliders_repo.sh follows a few conventions:

  • Each two subsequent commits form a pair that corresponds to a diff in the source repository that is ambiguous.
  • The commit message of the first commit in a pair consists of the path to the file this is a diff of. It comes from the blob id of the file in the source repository representing the old/before state. That way, the original file name in the source repository doesn’t matter and we don’t have to encode it anywhere.

The test runs assertions for each pair of commits. It reads the baseline the comes from git diff and compares it to the output of gix-diff.

Open questions

  • Do we want to go for an approach that maps pairs of commits to individual test cases? That way, we would be able to run the tests in parallel, something that’s not possible given the current setup. The current setup runs tests sequentially and stops at the first failure.

Copy link
Member

@Byron Byron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for making this happen! Once the slider problem is solved for us and validated, it's such a major leap for the diff-quality in gitoxide.

There is a major question I have that isn't mentioned inline: Is it easy to read/understand the output of the pretty assertion that compares both diffs using the internal hunk format? How does that fare in comparison to diffing unified diffs?

This obviates the need to create a git history which considerably
simplifies the test as it can just direclty read files instead.
This makes visually interpreting differences between the two versions
significantly easier.
@cruessler
Copy link
Contributor Author

There is a major question I have that isn't mentioned inline: Is it easy to read/understand the output of the pretty assertion that compares both diffs using the internal hunk format? How does that fare in comparison to diffing unified diffs?

This is two different diffs, the first one uses the internal hunk format, the second one uses unified diffs. I think the second one is significantly easier to understand, so I changed the test to use that format in the last commit.

Using the internal hunk format

diffing-non-textual-diffs

Using unified diffs

diffing-textual-diffs

Copy link
Member

@Byron Byron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks so much simpler now!

The diffs are also much easier to understand, yet I am wondering: would an actual unified diff be better to diff against another unified diff? That would retain diff lines, and I think the common term for it is 'interdiff'.

let mut blocks: Vec<String> = vec![format!(
r#"#!/usr/bin/env bash
# TODO:
# `git diff --no-index` returns 1 when there's differences, but 1 is treated as an error by the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd keep the set -eu… and do git diff … || true or something like that, assuming that it will keep working.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion, changed!


blocks.push(format!(
r#"git diff --no-index "$ROOT/{asset_dir}/{old_blob_id}.commit" "$ROOT/{asset_dir}/{new_blob_id}.commit" > .git/{old_blob_id}-{new_blob_id}.baseline
cp "$ROOT/{asset_dir}/{old_blob_id}.commit" assets/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these .commits?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion, changed to .blob.

@cruessler
Copy link
Contributor Author

The diffs are also much easier to understand, yet I am wondering: would an actual unified diff be better to diff against another unified diff? That would retain diff lines, and I think the common term for it is 'interdiff'.

That’s possible and it’s easier than I thought it would be. This is the result:

diffing-unified-diffs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants