Commit ed1bbfa
committed
diff: apply heuristics borrowed from GNU diff for "good enough"
This change adds some checks to decide that the search for the best
place to split the diffing process has gone for too long, or long
enough while finding a good chunk of matches.
They are based on similar heuristics that GNU diff applies and will
help in cases in which files are very long and have few common
sequences.
This brings comparing some large files (~36MB) that are very different
from ~1 hour to ~8 seconds, but it will still hit some pathological
cases, such as some very large cpp files I created for some benchmarking
that still take 1 minute.
Benchmark 1: diff test-data/huge-base test-data/huge-very-different
Time (mean ± σ): 2.790 s ± 0.005 s [User: 2.714 s, System: 0.063 s]
Range (min … max): 2.781 s … 2.798 s 10 runs
Warning: Ignoring non-zero exit code.
Benchmark 2: ./target/release/diffutils.no-heuristics diff test-data/huge-base test-data/huge-very-different
Time (mean ± σ): 4755.084 s ± 172.607 s [User: 4727.169 s, System: 0.330 s]
Range (min … max): 4607.522 s … 5121.135 s 10 runs
Warning: Ignoring non-zero exit code.
Benchmark 3: ./target/release/diffutils diff test-data/huge-base test-data/huge-very-different
Time (mean ± σ): 7.197 s ± 0.099 s [User: 7.055 s, System: 0.094 s]
Range (min … max): 7.143 s … 7.416 s 10 runs
Warning: Ignoring non-zero exit code.
Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
Summary
diff test-data/huge-base test-data/huge-very-different ran
2.58 ± 0.04 times faster than ./target/release/diffutils diff test-data/huge-base test-data/huge-very-different
1704.04 ± 61.93 times faster than ./target/release/diffutils.no-heuristics diff test-data/huge-base test-data/huge-very-different
Note that the worse that should happen by heuristics causing the search
to end early is a suboptimal diff, but the diff will still be correct
and usable with patch.1 parent 74d2fad commit ed1bbfa
1 file changed
+50
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
19 | 24 | | |
20 | 25 | | |
21 | 26 | | |
| |||
82 | 87 | | |
83 | 88 | | |
84 | 89 | | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
85 | 99 | | |
86 | 100 | | |
87 | 101 | | |
88 | 102 | | |
89 | | - | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
90 | 110 | | |
91 | 111 | | |
92 | 112 | | |
| |||
192 | 212 | | |
193 | 213 | | |
194 | 214 | | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
195 | 228 | | |
| 229 | + | |
196 | 230 | | |
197 | 231 | | |
198 | 232 | | |
| |||
355 | 389 | | |
356 | 390 | | |
357 | 391 | | |
| 392 | + | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
| 398 | + | |
| 399 | + | |
| 400 | + | |
| 401 | + | |
| 402 | + | |
| 403 | + | |
| 404 | + | |
| 405 | + | |
| 406 | + | |
358 | 407 | | |
359 | 408 | | |
360 | 409 | | |
| |||
0 commit comments