Skip to content

LCS Algorithms

Wilfred Hughes edited this page Mar 21, 2025 · 4 revisions

Finding a minimal set of changes between two files is a well-researched problem, often described as finding the 'longest common subsequence' (LCS) or 'shortest edit script'.

See also discussion here: https://github.com/mitsuhiko/similar/issues/44#issuecomment-1412178765

Wu Diff Algorithm (1990)

Wu, Sun, Udi Manber, Eugene Wimberly Myers and Webb Miller. "An O(NP) Sequence Comparison Algorithm." Inf. Process. Lett. 35 (1990): 317-323.

https://doi.org/10.1016/0020-0190(90)90035-V

This is the algorithm used in difftastic, and is also used in Subversion's diff.

Myers Diff Algorithm (1986)

Myers, Eugene Wimberly. "An O(ND) difference algorithm and its variations." Algorithmica 1 (1986): 251-266.

https://doi.org/10.1007/BF01840446

This algorithm is widely implemented, including in the diff crate. There's a great introduction here.

GNU diff has a bunch of heuristics to avoid pathological cases that can occur in a simple implementation. The imara-diff crate implements these in Rust.

Google's go-cmp also has an interesting variant that optimises for performance over minimal diffs.

Hunt–McIlroy Algorithm (1976)

https://doi.org/10.1145/359581.359603

The original 1970s diff tool used the Hunt-McIlroy algorithm (sometimes called the Hunt–Szymanski algorithm).

Clone this wiki locally