perf: optimize diff algorithm with 4x speedup #24

bernaferrari · 2025-12-22T22:03:12Z

Summary

Significant performance improvements to the diff algorithm through multiple optimizations:

Interning: Maps items to integer IDs for faster areItemsTheSame() comparisons
IntIntMap: Primitive int→int hashmap avoiding boxing overhead
Patience anchors: For lists >1000 items, uses unique elements as anchor points to split the diff problem
Inline pragmas: VM hints for hot path methods

Benchmarks

Benchmark	Before	After	Speedup
RandomDiff(1000)	193 ms	116 ms	1.7x
RandomDiff(5000)	4,957 ms	422 ms	11.7x
RandomDiff(10000)	20,661 ms	1,704 ms	12.1x
RealWorldCode(3000 lines)	82 ms	13 ms	6.4x
LargeFile(10000 lines)	59 ms	25 ms	2.4x

Breaking Changes

None. All existing tests pass without modification.

- Interning: map items to int IDs for O(1) comparisons - Prefix/Suffix trimming: skip common head/tail before diff - IntIntMap: primitive int->int hash map (no boxing) - Int32List: reduced memory for ID arrays Benchmarks (1000 items): - RandomDiff: ~36% faster (202ms -> 130ms) - PrefixSuffix: ~4x faster (377µs -> 88µs) - InsertDelete: ~40% faster (3.8ms -> 2.3ms)

- IntIntMap: primitive int->int hash map with open addressing - Int32List: memory optimization for ID arrays - anchors.dart: patience-style anchor finding with LIS algorithm (prepared for future integration)

- Patience anchors now mark unique matches with negative IDs - Use Uint8List bitmask for O(1) anchor membership check - IntIntMap for fast hash-based interning - Maintain correct collision handling Results: - RandomDiff(10000): 15.5s -> 13.0s (16% faster) - RandomDiff(1000): 202ms -> 124ms (39% faster) - PrefixSuffix(1000): 378µs -> 86µs (4.4x faster)

@pragma

Added @pragma vm:prefer-inline to: - _Snake.hasAdditionOrRemoval(), isAddition(), diagonalSize() - _Range.oldSize(), newSize() Results: - RandomDiff(1000): 124ms -> 116ms (6% faster) - RandomDiff(10000): 13.0s -> 12.6s (3% faster)

…avior The prefix/suffix optimization was greedily locking in matches for duplicate elements, changing which duplicate gets preserved. This caused regression test failures for issue knaeckeKami#15. Removed prefix/suffix trimming from both interner.dart and calculateDiff(). Other optimizations (interning, IntIntMap, anchors, inline pragmas) remain.

knaeckeKami · 2025-12-27T23:21:13Z

Interning now maps items to integer IDs using only hashCode, so distinct items with the same hash are treated as identical. Dart allows hash collisions, which changes correctness: the diff can return no updates when items actually changed (e.g., CollisionPair(1,2) vs CollisionPair(2,1) both hash to 3 via xor). I pushed failing regression tests on this PR branch to demonstrate the issue. A fix likely needs collision handling (bucket by hash + verify ==) or a guard/option to disable interning when collisions are possible.

knaeckeKami · 2025-12-27T23:30:13Z

Or: add a caller-supplied key (e.g., keyOf/idOf) to calculateListDiff so interning can use stable IDs instead of raw hashes.

…handle hash collisions during item interning.

bernaferrari · 2025-12-28T00:44:50Z

Very very good catch! Fixed!

knaeckeKami · 2025-12-29T12:43:03Z

I added an AOT benchmark harness (tool/bench/bench.dart) following Dart microbenchmarking guidance (AOT compile, warmups, calibration to a target runtime, fixed inputs; refs: https://mrale.ph/blog/2021/01/21/microbenchmarking-dart-part-1.html and https://mrale.ph/blog/2024/11/27/microbenchmarks-are-experiments.html). I ran it against:

master
the initial PR head (8bd5a66, hash-collision bug)
current head (collision fix)

It reports median us/iter for sizes 10/100/1000/10000, diff patterns none/few/many, for both int lists and object lists (8-field class with standard ==/hashCode).

Summary:

For none/few diffs the PR is materially slower than master (often ~3–12x).
For many diffs the PR is faster (~10–15%).
The collision fix adds ~1–2% overhead vs the buggy interner.

Full tables (median us/iter, AOT):

int

size	diffs	master (us)	bug `8bd5a66` (us)	after (us)
10	none	0.29	0.45	0.96
10	few	0.45	0.59	1.10
10	many	1.07	1.08	1.88
100	none	1.17	6.12	5.89
100	few	1.64	6.68	6.35
100	many	53.55	49.93	55.33
1000	none	9.97	57.38	57.45
1000	few	22.21	69.63	67.61
1000	many	4927.75	4238.63	4314.00
10000	none	98.77	1124.00	1124.19
10000	few	425.30	1168.28	1175.91
10000	many	486125.00	418311.00	419665.00

object

size	diffs	master (us)	bug `8bd5a66` (us)	after (us)
10	none	0.32	0.72	1.39
10	few	0.49	0.87	1.52
10	many	1.34	1.43	2.31
100	none	1.34	8.91	8.54
100	few	1.89	9.30	8.87
100	many	71.59	56.79	62.90
1000	none	11.47	88.04	86.46
1000	few	26.47	99.44	98.09
1000	many	6736.00	4311.38	4378.88
10000	none	117.26	1454.06	1443.38
10000	few	528.05	1476.00	1474.06
10000	many	666148.00	419549.00	423003.00

This seems at odds with the PR description claiming ~20% to 10x speedups. Can you clarify how those measurements were obtained (workload, inputs, tooling, JIT vs AOT, warmups/samples)? I want to align the benchmark methodology so we compare apples-to-apples.

bernaferrari and others added 8 commits December 22, 2025 14:46

perf: add IntIntMap and patience anchors foundation

de8c66a

- IntIntMap: primitive int->int hash map with open addressing - Int32List: memory optimization for ID arrays - anchors.dart: patience-style anchor finding with LIS algorithm (prepared for future integration)

perf: add inline pragmas to hot path methods

316252d

Added @pragma vm:prefer-inline to: - _Snake.hasAdditionOrRemoval(), isAddition(), diagonalSize() - _Range.oldSize(), newSize() Results: - RandomDiff(1000): 124ms -> 116ms (6% faster) - RandomDiff(10000): 13.0s -> 12.6s (3% faster)

feat: implement anchor splitting optimization for 4.7x speedup

d9895bc

chore: release version 4.1.0

5912ff2

test: cover interning collisions and anchors

76bfa9d

refactor: replace IntIntMap with a standard map and _ItemIdPair to …

92cd6e9

…handle hash collisions during item interning.

knaeckeKami added 2 commits December 29, 2025 13:27

bench: add AOT diff benchmark

fe1d8f9

bench: add object datasets

b5cef83

knaeckeKami added 4 commits December 29, 2025 13:57

bench: add object-fresh mode

544a013

bench: always use fresh object lists

e33e99d

chore: format benchmark

022b85b

chore: fix benchmark analyze warnings

6c43d4e

knaeckeKami self-assigned this Dec 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: optimize diff algorithm with 4x speedup #24

perf: optimize diff algorithm with 4x speedup #24

Uh oh!

bernaferrari commented Dec 22, 2025

Uh oh!

knaeckeKami commented Dec 27, 2025

Uh oh!

knaeckeKami commented Dec 27, 2025 •

edited

Loading

Uh oh!

bernaferrari commented Dec 28, 2025

Uh oh!

knaeckeKami commented Dec 29, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

perf: optimize diff algorithm with 4x speedup #24

Are you sure you want to change the base?

perf: optimize diff algorithm with 4x speedup #24

Uh oh!

Conversation

bernaferrari commented Dec 22, 2025

Summary

Benchmarks

Breaking Changes

Uh oh!

knaeckeKami commented Dec 27, 2025

Uh oh!

knaeckeKami commented Dec 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bernaferrari commented Dec 28, 2025

Uh oh!

knaeckeKami commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

knaeckeKami commented Dec 27, 2025 •

edited

Loading

knaeckeKami commented Dec 29, 2025 •

edited

Loading