|
| 1 | +# Benchmarking diff |
| 2 | + |
| 3 | +The engine used by our diff tool tries to balance execution time with patch |
| 4 | +quality. It implements the Myers algorithm with a few heuristics which are also |
| 5 | +used by GNU diff to avoid pathological cases. |
| 6 | + |
| 7 | +The original paper can be found here: |
| 8 | +- https://link.springer.com/article/10.1007/BF01840446 |
| 9 | + |
| 10 | +Currently, not all tricks used by GNU diff are adopted by our implementation. |
| 11 | +For instance, GNU diff will isolate lines that only exist in each of the files |
| 12 | +and not include them on the diffing process. It also does post-processing of the |
| 13 | +edits to produce more cohesive hunks. Both of these combinar should make it |
| 14 | +produce better patches for large files which are very different. |
| 15 | + |
| 16 | +Run `cargo build --release` before benchmarking after you make a change! |
| 17 | + |
| 18 | +## How to benchmark |
| 19 | + |
| 20 | +It is recommended that you use the 'hyperfine' tool to run your benchmarks. This |
| 21 | +is an example of how to run a comparison with GNU diff: |
| 22 | + |
| 23 | +``` |
| 24 | +> hyperfine -N -i --warmup 2 --output=pipe 'diff t/huge t/huge.3' |
| 25 | +'./target/release/diffutils diff t/huge t/huge.3' |
| 26 | +Benchmark 1: diff t/huge t/huge.3 |
| 27 | + Time (mean ± σ): 136.3 ms ± 3.0 ms [User: 88.5 ms, System: 17.9 ms] |
| 28 | + Range (min … max): 131.8 ms … 144.4 ms 21 runs |
| 29 | +
|
| 30 | + Warning: Ignoring non-zero exit code. |
| 31 | +
|
| 32 | +Benchmark 2: ./target/release/diffutils diff t/huge t/huge.3 |
| 33 | + Time (mean ± σ): 74.4 ms ± 1.0 ms [User: 47.6 ms, System: 24.9 ms] |
| 34 | + Range (min … max): 72.9 ms … 77.1 ms 41 runs |
| 35 | +
|
| 36 | + Warning: Ignoring non-zero exit code. |
| 37 | +
|
| 38 | +Summary |
| 39 | + ./target/release/diffutils diff t/huge t/huge.3 ran |
| 40 | + 1.83 ± 0.05 times faster than diff t/huge t/huge.3 |
| 41 | +> |
| 42 | +``` |
| 43 | + |
| 44 | +As you can see, you should provide both commands you want to compare on a single |
| 45 | +invocation of 'hyperfine'. Each as a single argument, so use quotes. These are |
| 46 | +the relevant parameters: |
| 47 | + |
| 48 | +- -N: avoids using a shell as intermediary to run the command |
| 49 | +- -i: ignores non-zero exit code, which diff uses to mean files differ |
| 50 | +- --warmup 2: 2 runs before measuring, warms up I/O cache for large files |
| 51 | +- --output=pipe: disable any potential optimizations based on output destination |
| 52 | + |
| 53 | +## Inputs |
| 54 | + |
| 55 | +Performance will vary based on several factors, the main ones being: |
| 56 | + |
| 57 | +- how large the files being compared are |
| 58 | +- how different the files being compared are |
| 59 | +- how large and far between sequences of equal lines are |
| 60 | + |
| 61 | +When looking at performance improvements, testing small and large (tens of MBs) |
| 62 | +which have few differences, many differences, completely different is important |
| 63 | +to cover all of the potential pathological cases. |
0 commit comments