You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
blame: use a fingerprint heuristic to match ignored lines
This replaces the heuristic used to identify lines from ignored commits
with one that finds likely candidate lines in the parent's version of
the file.
The old heuristic simply assigned lines in the target to the same line
number (plus offset) in the parent. The new function uses a
fingerprinting algorithm to detect similarity between lines.
The fingerprint code and the idea to use them for blame came from
Michael Platings <[email protected]>.
For each line changed in the target, i.e. in a blame_entry touched by a
target's diff, guess_line_blames() finds the best line in the parent,
above a magic threshold. Ties are broken by proximity of the parent
line number to the target's line.
We actually make two passes. The first pass checks in the diff chunk
associated with the blame entry - specifically from blame_chunk().
Often times, those diff chunks are small; any 'context' in a normal diff
chunk is broken up into multiple calls to blame_chunk(). We make a
second pass over the entire parent, with a slightly higher threshold.
Here's an example of the difference the fingerprinting makes. Consider
a file with four commits:
commit-a 11) void new_func_1(void *x, void *y);
commit-b 12) void new_func_2(void *x, void *y);
commit-c 13) some_line_c
commit-d 14) some_line_d
After a commit 'X', we have:
commit-X 11) void new_func_1(void *x,
commit-X 12) void *y);
commit-X 13) void new_func_2(void *x,
commit-X 14) void *y);
commit-c 15) some_line_c
commit-d 16) some_line_d
When we blame-ignored with the old algorithm, we get:
commit-a 11) void new_func_1(void *x,
commit-b 12) void *y);
00000000 13) void new_func_2(void *x,
00000000 14) void *y);
commit-c 15) some_line_c
commit-d 16) some_line_d
Where commit-b is blamed for 12 instead of 13. With the fingerprint
algorithm, we get:
commit-a 11) void new_func_1(void *x,
commit-b 12) void *y);
commit-b 13) void new_func_2(void *x,
commit-b 14) void *y);
commit-c 15) some_line_c
commit-d 16) some_line_d
Note both lines 12 and 14 are given to commit b. Their match is above
the FINGERPRINT_CHUNK_THRESHOLD, and they tied. Specifically, parent
lines 11 and 12 both match these lines. The algorithm chose parent line
12, since that was closest to the target line numbers of 12 and 14.
If we increase the threshold, say to 10, those two lines won't match,
and will be treated as 'unblamable.'
For an example of scanning the entire parent for a match, consider:
commit-a 30) #include <sys/header_a.h>
commit-b 31) #include <header_b.h>
commit-c 32) #include <header_c.h>
Then commit X alphabetizes them:
commit-X 30) #include <header_b.h>
commit-X 31) #include <header_c.h>
commit-X 32) #include <sys/header_a.h>
If we just check the parent's chunk (i.e. the first pass), we'd get:
commit-b 30) #include <header_b.h>
commit-c 31) #include <header_c.h>
00000000 32) #include <sys/header_a.h>
That's because commit X consists of two chunks: one chunk is removing
sys/header_a.h, then some context, and the second chunk is adding
sys/header_a.h.
If we scan the entire parent file, we get:
commit-b 30) #include <header_b.h>
commit-c 31) #include <header_c.h>
commit-a 32) #include <sys/header_a.h>
Suggested-by: Michael Platings <[email protected]>
Signed-off-by: Barret Rhoden <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
0 commit comments