Skip to content

Commit a0b676a

Browse files
peffgitster
authored andcommitted
diff-highlight: document some non-optimal cases
The diff-highlight script works on heuristics, so it can be wrong. Let's document some of the wrong-ness in case somebody feels like working on it. Signed-off-by: Jeff King <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent 34d9819 commit a0b676a

File tree

1 file changed

+93
-0
lines changed

1 file changed

+93
-0
lines changed

contrib/diff-highlight/README

Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,3 +57,96 @@ following in your git configuration:
5757
show = diff-highlight | less
5858
diff = diff-highlight | less
5959
---------------------------------------------
60+
61+
Bugs
62+
----
63+
64+
Because diff-highlight relies on heuristics to guess which parts of
65+
changes are important, there are some cases where the highlighting is
66+
more distracting than useful. Fortunately, these cases are rare in
67+
practice, and when they do occur, the worst case is simply a little
68+
extra highlighting. This section documents some cases known to be
69+
sub-optimal, in case somebody feels like working on improving the
70+
heuristics.
71+
72+
1. Two changes on the same line get highlighted in a blob. For example,
73+
highlighting:
74+
75+
----------------------------------------------
76+
-foo(buf, size);
77+
+foo(obj->buf, obj->size);
78+
----------------------------------------------
79+
80+
yields (where the inside of "+{}" would be highlighted):
81+
82+
----------------------------------------------
83+
-foo(buf, size);
84+
+foo(+{obj->buf, obj->}size);
85+
----------------------------------------------
86+
87+
whereas a more semantically meaningful output would be:
88+
89+
----------------------------------------------
90+
-foo(buf, size);
91+
+foo(+{obj->}buf, +{obj->}size);
92+
----------------------------------------------
93+
94+
Note that doing this right would probably involve a set of
95+
content-specific boundary patterns, similar to word-diff. Otherwise
96+
you get junk like:
97+
98+
-----------------------------------------------------
99+
-this line has some -{i}nt-{ere}sti-{ng} text on it
100+
+this line has some +{fa}nt+{a}sti+{c} text on it
101+
-----------------------------------------------------
102+
103+
which is less readable than the current output.
104+
105+
2. The multi-line matching assumes that lines in the pre- and post-image
106+
match by position. This is often the case, but can be fooled when a
107+
line is removed from the top and a new one added at the bottom (or
108+
vice versa). Unless the lines in the middle are also changed, diffs
109+
will show this as two hunks, and it will not get highlighted at all
110+
(which is good). But if the lines in the middle are changed, the
111+
highlighting can be misleading. Here's a pathological case:
112+
113+
-----------------------------------------------------
114+
-one
115+
-two
116+
-three
117+
-four
118+
+two 2
119+
+three 3
120+
+four 4
121+
+five 5
122+
-----------------------------------------------------
123+
124+
which gets highlighted as:
125+
126+
-----------------------------------------------------
127+
-one
128+
-t-{wo}
129+
-three
130+
-f-{our}
131+
+two 2
132+
+t+{hree 3}
133+
+four 4
134+
+f+{ive 5}
135+
-----------------------------------------------------
136+
137+
because it matches "two" to "three 3", and so forth. It would be
138+
nicer as:
139+
140+
-----------------------------------------------------
141+
-one
142+
-two
143+
-three
144+
-four
145+
+two +{2}
146+
+three +{3}
147+
+four +{4}
148+
+five 5
149+
-----------------------------------------------------
150+
151+
which would probably involve pre-matching the lines into pairs
152+
according to some heuristic.

0 commit comments

Comments
 (0)