Skip to content

Commit a5c9167

Browse files
Barret Rhodengitster
authored andcommitted
blame: add the ability to ignore commits and their changes
Commits that make formatting changes or function renames are often not interesting when blaming a file. A user may deem such a commit as 'not interesting' and want to ignore and its changes it when assigning blame. For example, say a file has the following git history / rev-list: ---O---A---X---B---C---D---Y---E---F Commits X and Y both touch a particular line, and the other commits do not: X: "Take a third parameter" -MyFunc(1, 2); +MyFunc(1, 2, 3); Y: "Remove camelcase" -MyFunc(1, 2, 3); +my_func(1, 2, 3); git-blame will blame Y for the change. I'd like to be able to ignore Y: both the existence of the commit as well as any changes it made. This differs from -S rev-list, which specifies the list of commits to process for the blame. We would still process Y, but just don't let the blame 'stick.' This patch adds the ability for users to ignore a revision with --ignore-rev=rev, which may be repeated. They can specify a set of files of full object names of revs, e.g. SHA-1 hashes, one per line. A single file may be specified with the blame.ignoreRevFile config option or with --ignore-rev-file=file. Both the config option and the command line option may be repeated multiple times. An empty file name "" will clear the list of revs from previously processed files. Config options are processed before command line options. For a typical use case, projects will maintain the file containing revisions for commits that perform mass reformatting, and their users have the optional to ignore all of the commits in that file. Additionally, a user can use the --ignore-rev option for one-off investigation. To go back to the example above, X was a substantive change to the function, but not the change the user is interested in. The user inspected X, but wanted to find the previous change to that line - perhaps a commit that introduced that function call. To make this work, we can't simply remove all ignored commits from the rev-list. We need to diff the changes introduced by Y so that we can ignore them. We let the blames get passed to Y, just like when processing normally. When Y is the target, we make sure that Y does not *keep* any blames. Any changes that Y is responsible for get passed to its parent. Note we make one pass through all of the scapegoats (parents) to attempt to pass blame normally; we don't know if we *need* to ignore the commit until we've checked all of the parents. The blame_entry will get passed up the tree until we find a commit that has a diff chunk that affects those lines. One issue is that the ignored commit *did* make some change, and there is no general solution to finding the line in the parent commit that corresponds to a given line in the ignored commit. That makes it hard to attribute a particular line within an ignored commit's diff correctly. For example, the parent of an ignored commit has this, say at line 11: commit-a 11) #include "a.h" commit-b 12) #include "b.h" Commit X, which we will ignore, swaps these lines: commit-X 11) #include "b.h" commit-X 12) #include "a.h" We can pass that blame entry to the parent, but line 11 will be attributed to commit A, even though "include b.h" came from commit B. The blame mechanism will be looking at the parent's view of the file at line number 11. ignore_blame_entry() is set up to allow alternative algorithms for guessing per-line blames. Any line that is not attributed to the parent is marked as 'unblamable', and we output a hash of all zeros. The existing algorithm is simple: blame each line on the corresponding line in the parent's diff chunk. Any lines beyond that stay with the target. For example, the parent of an ignored commit has this, say at line 11: commit-a 11) void new_func_1(void *x, void *y); commit-b 12) void new_func_2(void *x, void *y); commit-c 13) some_line_c commit-d 14) some_line_d After a commit 'X', we have: commit-X 11) void new_func_1(void *x, commit-X 12) void *y); commit-X 13) void new_func_2(void *x, commit-X 14) void *y); commit-c 15) some_line_c commit-d 16) some_line_d Commit X nets two additionally lines: 13 and 14. The current guess_line_blames() algorithm will not attribute these to the parent, whose diff chunk is only two lines - not four. When we ignore with the current algorithm, we get: commit-a 11) void new_func_1(void *x, commit-b 12) void *y); 00000000 13) void new_func_2(void *x, 00000000 14) void *y); commit-c 15) some_line_c commit-d 16) some_line_d Note that line 12 was blamed on B, though B was the commit for new_func_2(), not new_func_1(). Even when guess_line_blames() finds a line in the parent, it may still be incorrect. Signed-off-by: Barret Rhoden <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent 808f10d commit a5c9167

File tree

7 files changed

+406
-10
lines changed

7 files changed

+406
-10
lines changed

Documentation/blame-options.txt

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -110,5 +110,19 @@ commit. And the default value is 40. If there are more than one
110110
`-C` options given, the <num> argument of the last `-C` will
111111
take effect.
112112

113+
--ignore-rev <rev>::
114+
Ignore changes made by the revision when assigning blame, as if the
115+
change never happened. Lines that were changed or added by an ignored
116+
commit will be blamed on the previous commit that changed that line or
117+
nearby lines. This option may be specified multiple times to ignore
118+
more than one revision.
119+
120+
--ignore-revs-file <file>::
121+
Ignore revisions listed in `file`, one unabbreviated object name per line.
122+
Whitespace and comments beginning with `#` are ignored. This option may be
123+
repeated, and these files will be processed after any files specified with
124+
the `blame.ignoreRevsFile` config option. An empty file name, `""`, will
125+
clear the list of revs from previously processed files.
126+
113127
-h::
114128
Show help message.

Documentation/config/blame.txt

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,3 +19,10 @@ blame.showEmail::
1919
blame.showRoot::
2020
Do not treat root commits as boundaries in linkgit:git-blame[1].
2121
This option defaults to false.
22+
23+
blame.ignoreRevsFile::
24+
Ignore revisions listed in the file, one unabbreviated object name per
25+
line, in linkgit:git-blame[1]. Whitespace and comments beginning with
26+
`#` are ignored. This option may be repeated multiple times. Empty
27+
file names will reset the list of ignored revisions. This option will
28+
be handled before the command line option `--ignore-revs-file`.

Documentation/git-blame.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ SYNOPSIS
1010
[verse]
1111
'git blame' [-c] [-b] [-l] [--root] [-t] [-f] [-n] [-s] [-e] [-p] [-w] [--incremental]
1212
[-L <range>] [-S <revs-file>] [-M] [-C] [-C] [-C] [--since=<date>]
13+
[--ignore-rev <rev>] [--ignore-revs-file <file>]
1314
[--progress] [--abbrev=<n>] [<rev> | --contents <file> | --reverse <rev>..<rev>]
1415
[--] <file>
1516

blame.c

Lines changed: 171 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -479,7 +479,8 @@ void blame_coalesce(struct blame_scoreboard *sb)
479479

480480
for (ent = sb->ent; ent && (next = ent->next); ent = next) {
481481
if (ent->suspect == next->suspect &&
482-
ent->s_lno + ent->num_lines == next->s_lno) {
482+
ent->s_lno + ent->num_lines == next->s_lno &&
483+
ent->unblamable == next->unblamable) {
483484
ent->num_lines += next->num_lines;
484485
ent->next = next->next;
485486
blame_origin_decref(next->suspect);
@@ -731,6 +732,10 @@ static void split_overlap(struct blame_entry *split,
731732
int chunk_end_lno;
732733
memset(split, 0, sizeof(struct blame_entry [3]));
733734

735+
split[0].unblamable = e->unblamable;
736+
split[1].unblamable = e->unblamable;
737+
split[2].unblamable = e->unblamable;
738+
734739
if (e->s_lno < tlno) {
735740
/* there is a pre-chunk part not blamed on parent */
736741
split[0].suspect = blame_origin_incref(e->suspect);
@@ -851,6 +856,7 @@ static struct blame_entry *split_blame_at(struct blame_entry *e, int len,
851856
struct blame_entry *n = xcalloc(1, sizeof(struct blame_entry));
852857

853858
n->suspect = new_suspect;
859+
n->unblamable = e->unblamable;
854860
n->lno = e->lno + len;
855861
n->s_lno = e->s_lno + len;
856862
n->num_lines = e->num_lines - len;
@@ -859,6 +865,109 @@ static struct blame_entry *split_blame_at(struct blame_entry *e, int len,
859865
return n;
860866
}
861867

868+
struct blame_line_tracker {
869+
int is_parent;
870+
int s_lno;
871+
};
872+
873+
static int are_lines_adjacent(struct blame_line_tracker *first,
874+
struct blame_line_tracker *second)
875+
{
876+
return first->is_parent == second->is_parent &&
877+
first->s_lno + 1 == second->s_lno;
878+
}
879+
880+
/*
881+
* This cheap heuristic assigns lines in the chunk to their relative location in
882+
* the parent's chunk. Any additional lines are left with the target.
883+
*/
884+
static void guess_line_blames(struct blame_entry *e,
885+
struct blame_origin *parent,
886+
struct blame_origin *target,
887+
int offset, int parent_slno, int parent_len,
888+
struct blame_line_tracker *line_blames)
889+
{
890+
int i, parent_idx;
891+
892+
for (i = 0; i < e->num_lines; i++) {
893+
parent_idx = e->s_lno + i + offset;
894+
if (parent_slno <= parent_idx &&
895+
parent_idx < parent_slno + parent_len) {
896+
line_blames[i].is_parent = 1;
897+
line_blames[i].s_lno = parent_idx;
898+
} else {
899+
line_blames[i].is_parent = 0;
900+
line_blames[i].s_lno = e->s_lno + i;
901+
}
902+
}
903+
}
904+
905+
/*
906+
* This decides which parts of a blame entry go to the parent (added to the
907+
* ignoredp list) and which stay with the target (added to the diffp list). The
908+
* actual decision is made in a separate heuristic function. This consumes e,
909+
* essentially putting it on a list.
910+
*
911+
* Note that the blame entries on the ignoredp list are not necessarily sorted
912+
* with respect to the parent's line numbers yet.
913+
*/
914+
static void ignore_blame_entry(struct blame_entry *e,
915+
struct blame_origin *parent,
916+
struct blame_origin *target,
917+
int offset, int parent_slno, int parent_len,
918+
struct blame_entry **diffp,
919+
struct blame_entry **ignoredp)
920+
{
921+
struct blame_line_tracker *line_blames;
922+
int entry_len, nr_lines, i;
923+
924+
line_blames = xcalloc(sizeof(struct blame_line_tracker),
925+
e->num_lines);
926+
guess_line_blames(e, parent, target, offset, parent_slno, parent_len,
927+
line_blames);
928+
/*
929+
* We carve new entries off the front of e. Each entry comes from a
930+
* contiguous chunk of lines: adjacent lines from the same origin
931+
* (either the parent or the target).
932+
*/
933+
entry_len = 1;
934+
nr_lines = e->num_lines; // e changes in the loop
935+
for (i = 0; i < nr_lines; i++) {
936+
struct blame_entry *next = NULL;
937+
938+
/*
939+
* We are often adjacent to the next line - only split the blame
940+
* entry when we have to.
941+
*/
942+
if (i + 1 < nr_lines) {
943+
if (are_lines_adjacent(&line_blames[i],
944+
&line_blames[i + 1])) {
945+
entry_len++;
946+
continue;
947+
}
948+
next = split_blame_at(e, entry_len,
949+
blame_origin_incref(e->suspect));
950+
}
951+
if (line_blames[i].is_parent) {
952+
blame_origin_decref(e->suspect);
953+
e->suspect = blame_origin_incref(parent);
954+
e->s_lno = line_blames[i - entry_len + 1].s_lno;
955+
e->next = *ignoredp;
956+
*ignoredp = e;
957+
} else {
958+
e->unblamable = 1;
959+
/* e->s_lno is already in the target's address space. */
960+
e->next = *diffp;
961+
*diffp = e;
962+
}
963+
assert(e->num_lines == entry_len);
964+
e = next;
965+
entry_len = 1;
966+
}
967+
assert(!e);
968+
free(line_blames);
969+
}
970+
862971
/*
863972
* Process one hunk from the patch between the current suspect for
864973
* blame_entry e and its parent. This first blames any unfinished
@@ -868,13 +977,19 @@ static struct blame_entry *split_blame_at(struct blame_entry *e, int len,
868977
* -C options may lead to overlapping/duplicate source line number
869978
* ranges, all we can rely on from sorting/merging is the order of the
870979
* first suspect line number.
980+
*
981+
* tlno: line number in the target where this chunk begins
982+
* same: line number in the target where this chunk ends
983+
* offset: add to tlno to get the chunk starting point in the parent
984+
* parent_len: number of lines in the parent chunk
871985
*/
872986
static void blame_chunk(struct blame_entry ***dstq, struct blame_entry ***srcq,
873-
int tlno, int offset, int same,
874-
struct blame_origin *parent)
987+
int tlno, int offset, int same, int parent_len,
988+
struct blame_origin *parent,
989+
struct blame_origin *target, int ignore_diffs)
875990
{
876991
struct blame_entry *e = **srcq;
877-
struct blame_entry *samep = NULL, *diffp = NULL;
992+
struct blame_entry *samep = NULL, *diffp = NULL, *ignoredp = NULL;
878993

879994
while (e && e->s_lno < tlno) {
880995
struct blame_entry *next = e->next;
@@ -942,10 +1057,29 @@ static void blame_chunk(struct blame_entry ***dstq, struct blame_entry ***srcq,
9421057
n->next = samep;
9431058
samep = n;
9441059
}
945-
e->next = diffp;
946-
diffp = e;
1060+
if (ignore_diffs) {
1061+
ignore_blame_entry(e, parent, target, offset,
1062+
tlno + offset, parent_len, &diffp,
1063+
&ignoredp);
1064+
} else {
1065+
e->next = diffp;
1066+
diffp = e;
1067+
}
9471068
e = next;
9481069
}
1070+
if (ignoredp) {
1071+
/*
1072+
* Note ignoredp is not sorted yet, and thus neither is dstq.
1073+
* That list must be sorted before we queue_blames(). We defer
1074+
* sorting until after all diff hunks are processed, so that
1075+
* guess_line_blames() can pick *any* line in the parent. The
1076+
* slight drawback is that we end up sorting all blame entries
1077+
* passed to the parent, including those that are unrelated to
1078+
* changes made by the ignored commit.
1079+
*/
1080+
**dstq = reverse_blame(ignoredp, **dstq);
1081+
*dstq = &ignoredp->next;
1082+
}
9491083
**srcq = reverse_blame(diffp, reverse_blame(samep, e));
9501084
/* Move across elements that are in the unblamable portion */
9511085
if (diffp)
@@ -954,7 +1088,9 @@ static void blame_chunk(struct blame_entry ***dstq, struct blame_entry ***srcq,
9541088

9551089
struct blame_chunk_cb_data {
9561090
struct blame_origin *parent;
1091+
struct blame_origin *target;
9571092
long offset;
1093+
int ignore_diffs;
9581094
struct blame_entry **dstq;
9591095
struct blame_entry **srcq;
9601096
};
@@ -967,7 +1103,8 @@ static int blame_chunk_cb(long start_a, long count_a,
9671103
if (start_a - start_b != d->offset)
9681104
die("internal error in blame::blame_chunk_cb");
9691105
blame_chunk(&d->dstq, &d->srcq, start_b, start_a - start_b,
970-
start_b + count_b, d->parent);
1106+
start_b + count_b, count_a, d->parent, d->target,
1107+
d->ignore_diffs);
9711108
d->offset = start_a + count_a - (start_b + count_b);
9721109
return 0;
9731110
}
@@ -979,7 +1116,7 @@ static int blame_chunk_cb(long start_a, long count_a,
9791116
*/
9801117
static void pass_blame_to_parent(struct blame_scoreboard *sb,
9811118
struct blame_origin *target,
982-
struct blame_origin *parent)
1119+
struct blame_origin *parent, int ignore_diffs)
9831120
{
9841121
mmfile_t file_p, file_o;
9851122
struct blame_chunk_cb_data d;
@@ -989,7 +1126,9 @@ static void pass_blame_to_parent(struct blame_scoreboard *sb,
9891126
return; /* nothing remains for this target */
9901127

9911128
d.parent = parent;
1129+
d.target = target;
9921130
d.offset = 0;
1131+
d.ignore_diffs = ignore_diffs;
9931132
d.dstq = &newdest; d.srcq = &target->suspects;
9941133

9951134
fill_origin_blob(&sb->revs->diffopt, parent, &file_p, &sb->num_read_blob);
@@ -1001,8 +1140,13 @@ static void pass_blame_to_parent(struct blame_scoreboard *sb,
10011140
oid_to_hex(&parent->commit->object.oid),
10021141
oid_to_hex(&target->commit->object.oid));
10031142
/* The rest are the same as the parent */
1004-
blame_chunk(&d.dstq, &d.srcq, INT_MAX, d.offset, INT_MAX, parent);
1143+
blame_chunk(&d.dstq, &d.srcq, INT_MAX, d.offset, INT_MAX, 0,
1144+
parent, target, 0);
10051145
*d.dstq = NULL;
1146+
if (ignore_diffs)
1147+
newdest = llist_mergesort(newdest, get_next_blame,
1148+
set_next_blame,
1149+
compare_blame_suspect);
10061150
queue_blames(sb, parent, newdest);
10071151

10081152
return;
@@ -1506,11 +1650,28 @@ static void pass_blame(struct blame_scoreboard *sb, struct blame_origin *origin,
15061650
blame_origin_incref(porigin);
15071651
origin->previous = porigin;
15081652
}
1509-
pass_blame_to_parent(sb, origin, porigin);
1653+
pass_blame_to_parent(sb, origin, porigin, 0);
15101654
if (!origin->suspects)
15111655
goto finish;
15121656
}
15131657

1658+
/*
1659+
* Pass remaining suspects for ignored commits to their parents.
1660+
*/
1661+
if (oidset_contains(&sb->ignore_list, &commit->object.oid)) {
1662+
for (i = 0, sg = first_scapegoat(revs, commit, sb->reverse);
1663+
i < num_sg && sg;
1664+
sg = sg->next, i++) {
1665+
struct blame_origin *porigin = sg_origin[i];
1666+
1667+
if (!porigin)
1668+
continue;
1669+
pass_blame_to_parent(sb, origin, porigin, 1);
1670+
if (!origin->suspects)
1671+
goto finish;
1672+
}
1673+
}
1674+
15141675
/*
15151676
* Optionally find moves in parents' files.
15161677
*/

blame.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,7 @@ struct blame_entry {
9292
* scanning the lines over and over.
9393
*/
9494
unsigned score;
95+
int unblamable;
9596
};
9697

9798
/*
@@ -117,6 +118,8 @@ struct blame_scoreboard {
117118
/* linked list of blames */
118119
struct blame_entry *ent;
119120

121+
struct oidset ignore_list;
122+
120123
/* look-up a line in the final buffer */
121124
int num_lines;
122125
int *lineno;

0 commit comments

Comments
 (0)