Skip to content

Commit 1c57a62

Browse files
jherlandgitster
authored andcommitted
New --dirstat=lines mode, doing dirstat analysis based on diffstat
This patch adds an alternative implementation of show_dirstat(), called show_dirstat_by_line(), which uses the more expensive diffstat analysis (as opposed to show_dirstat()'s own (relatively inexpensive) analysis) to derive the numbers from which the --dirstat output is computed. The alternative implementation is controlled by the new "lines" parameter to the --dirstat option (or the diff.dirstat config variable). For binary files, the diffstat analysis counts bytes instead of lines, so to prevent binary files from dominating the dirstat results, the byte counts for binary files are divided by 64 before being compared to their textual/line-based counterparts. This is a stupid and ugly - but very cheap - heuristic. In linux-2.6.git, running the three different --dirstat modes: time git diff v2.6.20..v2.6.30 --dirstat=changes > /dev/null vs. time git diff v2.6.20..v2.6.30 --dirstat=lines > /dev/null vs. time git diff v2.6.20..v2.6.30 --dirstat=files > /dev/null yields the following average runtimes on my machine: - "changes" (default): ~6.0 s - "lines": ~9.6 s - "files": ~0.1 s So, as expected, there's a considerable performance hit (~60%) by going through the full diffstat analysis as compared to the default "changes" analysis (obviously, "files" is much faster than both). As such, the "lines" mode is probably only useful if you really need the --dirstat numbers to be consistent with the numbers returned from the other --*stat options. The patch also includes documentation and tests for the new dirstat mode. Improved-by: Junio C Hamano <[email protected]> Signed-off-by: Johan Herland <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent 712d2c7 commit 1c57a62

File tree

5 files changed

+176
-2
lines changed

5 files changed

+176
-2
lines changed

Documentation/diff-config.txt

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,14 @@ diff.dirstat::
2323
the amount of pure code movements within a file. In other words,
2424
rearranging lines in a file is not counted as much as other changes.
2525
This is the default behavior when no parameter is given.
26+
`lines`;;
27+
Compute the dirstat numbers by doing the regular line-based diff
28+
analysis, and summing the removed/added line counts. (For binary
29+
files, count 64-byte chunks instead, since binary files have no
30+
natural concept of lines). This is a more expensive `--dirstat`
31+
behavior than the `changes` behavior, but it does count rearranged
32+
lines within a file as much as other changes. The resulting output
33+
is consistent with what you get from the other `--*stat` options.
2634
`files`;;
2735
Compute the dirstat numbers by counting the number of files changed.
2836
Each changed file counts equally in the dirstat analysis. This is

Documentation/diff-options.txt

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,14 @@ endif::git-format-patch[]
8181
the amount of pure code movements within a file. In other words,
8282
rearranging lines in a file is not counted as much as other changes.
8383
This is the default behavior when no parameter is given.
84+
`lines`;;
85+
Compute the dirstat numbers by doing the regular line-based diff
86+
analysis, and summing the removed/added line counts. (For binary
87+
files, count 64-byte chunks instead, since binary files have no
88+
natural concept of lines). This is a more expensive `--dirstat`
89+
behavior than the `changes` behavior, but it does count rearranged
90+
lines within a file as much as other changes. The resulting output
91+
is consistent with what you get from the other `--*stat` options.
8492
`files`;;
8593
Compute the dirstat numbers by counting the number of files changed.
8694
Each changed file counts equally in the dirstat analysis. This is

diff.c

Lines changed: 59 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -73,9 +73,15 @@ static int parse_dirstat_params(struct diff_options *options, const char *params
7373
while (*p) {
7474
if (!prefixcmp(p, "changes")) {
7575
p += 7;
76+
DIFF_OPT_CLR(options, DIRSTAT_BY_LINE);
77+
DIFF_OPT_CLR(options, DIRSTAT_BY_FILE);
78+
} else if (!prefixcmp(p, "lines")) {
79+
p += 5;
80+
DIFF_OPT_SET(options, DIRSTAT_BY_LINE);
7681
DIFF_OPT_CLR(options, DIRSTAT_BY_FILE);
7782
} else if (!prefixcmp(p, "files")) {
7883
p += 5;
84+
DIFF_OPT_CLR(options, DIRSTAT_BY_LINE);
7985
DIFF_OPT_SET(options, DIRSTAT_BY_FILE);
8086
} else if (!prefixcmp(p, "noncumulative")) {
8187
p += 13;
@@ -1671,6 +1677,50 @@ static void show_dirstat(struct diff_options *options)
16711677
gather_dirstat(options, &dir, changed, "", 0);
16721678
}
16731679

1680+
static void show_dirstat_by_line(struct diffstat_t *data, struct diff_options *options)
1681+
{
1682+
int i;
1683+
unsigned long changed;
1684+
struct dirstat_dir dir;
1685+
1686+
if (data->nr == 0)
1687+
return;
1688+
1689+
dir.files = NULL;
1690+
dir.alloc = 0;
1691+
dir.nr = 0;
1692+
dir.permille = options->dirstat_permille;
1693+
dir.cumulative = DIFF_OPT_TST(options, DIRSTAT_CUMULATIVE);
1694+
1695+
changed = 0;
1696+
for (i = 0; i < data->nr; i++) {
1697+
struct diffstat_file *file = data->files[i];
1698+
unsigned long damage = file->added + file->deleted;
1699+
if (file->is_binary)
1700+
/*
1701+
* binary files counts bytes, not lines. Must find some
1702+
* way to normalize binary bytes vs. textual lines.
1703+
* The following heuristic assumes that there are 64
1704+
* bytes per "line".
1705+
* This is stupid and ugly, but very cheap...
1706+
*/
1707+
damage = (damage + 63) / 64;
1708+
ALLOC_GROW(dir.files, dir.nr + 1, dir.alloc);
1709+
dir.files[dir.nr].name = file->name;
1710+
dir.files[dir.nr].changed = damage;
1711+
changed += damage;
1712+
dir.nr++;
1713+
}
1714+
1715+
/* This can happen even with many files, if everything was renames */
1716+
if (!changed)
1717+
return;
1718+
1719+
/* Show all directories with more than x% of the changes */
1720+
qsort(dir.files, dir.nr, sizeof(dir.files[0]), dirstat_compare);
1721+
gather_dirstat(options, &dir, changed, "", 0);
1722+
}
1723+
16741724
static void free_diffstat_info(struct diffstat_t *diffstat)
16751725
{
16761726
int i;
@@ -4088,6 +4138,7 @@ void diff_flush(struct diff_options *options)
40884138
struct diff_queue_struct *q = &diff_queued_diff;
40894139
int i, output_format = options->output_format;
40904140
int separator = 0;
4141+
int dirstat_by_line = 0;
40914142

40924143
/*
40934144
* Order: raw, stat, summary, patch
@@ -4108,7 +4159,11 @@ void diff_flush(struct diff_options *options)
41084159
separator++;
41094160
}
41104161

4111-
if (output_format & (DIFF_FORMAT_DIFFSTAT|DIFF_FORMAT_SHORTSTAT|DIFF_FORMAT_NUMSTAT)) {
4162+
if (output_format & DIFF_FORMAT_DIRSTAT && DIFF_OPT_TST(options, DIRSTAT_BY_LINE))
4163+
dirstat_by_line = 1;
4164+
4165+
if (output_format & (DIFF_FORMAT_DIFFSTAT|DIFF_FORMAT_SHORTSTAT|DIFF_FORMAT_NUMSTAT) ||
4166+
dirstat_by_line) {
41124167
struct diffstat_t diffstat;
41134168

41144169
memset(&diffstat, 0, sizeof(struct diffstat_t));
@@ -4123,10 +4178,12 @@ void diff_flush(struct diff_options *options)
41234178
show_stats(&diffstat, options);
41244179
if (output_format & DIFF_FORMAT_SHORTSTAT)
41254180
show_shortstats(&diffstat, options);
4181+
if (output_format & DIFF_FORMAT_DIRSTAT)
4182+
show_dirstat_by_line(&diffstat, options);
41264183
free_diffstat_info(&diffstat);
41274184
separator++;
41284185
}
4129-
if (output_format & DIFF_FORMAT_DIRSTAT)
4186+
if ((output_format & DIFF_FORMAT_DIRSTAT) && !dirstat_by_line)
41304187
show_dirstat(options);
41314188

41324189
if (output_format & DIFF_FORMAT_SUMMARY && !is_summary_empty(q)) {

diff.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,7 @@ typedef struct strbuf *(*diff_prefix_fn_t)(struct diff_options *opt, void *data)
7878
#define DIFF_OPT_IGNORE_UNTRACKED_IN_SUBMODULES (1 << 25)
7979
#define DIFF_OPT_IGNORE_DIRTY_SUBMODULES (1 << 26)
8080
#define DIFF_OPT_OVERRIDE_SUBMODULE_CONFIG (1 << 27)
81+
#define DIFF_OPT_DIRSTAT_BY_LINE (1 << 28)
8182

8283
#define DIFF_OPT_TST(opts, flag) ((opts)->flags & DIFF_OPT_##flag)
8384
#define DIFF_OPT_SET(opts, flag) ((opts)->flags |= DIFF_OPT_##flag)

t/t4047-diff-dirstat.sh

Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -839,4 +839,104 @@ test_expect_success '--dirstat=files,cumulative,27.09' '
839839
test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
840840
'
841841

842+
cat <<EOF >expect_diff_dirstat
843+
10.6% dst/copy/changed/
844+
10.6% dst/copy/rearranged/
845+
10.6% dst/copy/unchanged/
846+
10.6% dst/move/changed/
847+
10.6% dst/move/rearranged/
848+
10.6% dst/move/unchanged/
849+
10.6% src/move/changed/
850+
10.6% src/move/rearranged/
851+
10.6% src/move/unchanged/
852+
EOF
853+
854+
cat <<EOF >expect_diff_dirstat_M
855+
5.2% changed/
856+
26.3% dst/copy/changed/
857+
26.3% dst/copy/rearranged/
858+
26.3% dst/copy/unchanged/
859+
5.2% dst/move/changed/
860+
5.2% dst/move/rearranged/
861+
5.2% rearranged/
862+
EOF
863+
864+
cat <<EOF >expect_diff_dirstat_CC
865+
16.6% changed/
866+
16.6% dst/copy/changed/
867+
16.6% dst/copy/rearranged/
868+
16.6% dst/move/changed/
869+
16.6% dst/move/rearranged/
870+
16.6% rearranged/
871+
EOF
872+
873+
test_expect_success '--dirstat=lines' '
874+
git diff --dirstat=lines HEAD^..HEAD >actual_diff_dirstat &&
875+
test_cmp expect_diff_dirstat actual_diff_dirstat &&
876+
git diff --dirstat=lines -M HEAD^..HEAD >actual_diff_dirstat_M &&
877+
test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
878+
git diff --dirstat=lines -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
879+
test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
880+
'
881+
882+
test_expect_success 'diff.dirstat=lines' '
883+
git -c diff.dirstat=lines diff --dirstat HEAD^..HEAD >actual_diff_dirstat &&
884+
test_cmp expect_diff_dirstat actual_diff_dirstat &&
885+
git -c diff.dirstat=lines diff --dirstat -M HEAD^..HEAD >actual_diff_dirstat_M &&
886+
test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
887+
git -c diff.dirstat=lines diff --dirstat -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
888+
test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
889+
'
890+
891+
cat <<EOF >expect_diff_dirstat
892+
2.1% changed/
893+
10.6% dst/copy/changed/
894+
10.6% dst/copy/rearranged/
895+
10.6% dst/copy/unchanged/
896+
10.6% dst/move/changed/
897+
10.6% dst/move/rearranged/
898+
10.6% dst/move/unchanged/
899+
2.1% rearranged/
900+
10.6% src/move/changed/
901+
10.6% src/move/rearranged/
902+
10.6% src/move/unchanged/
903+
EOF
904+
905+
cat <<EOF >expect_diff_dirstat_M
906+
5.2% changed/
907+
26.3% dst/copy/changed/
908+
26.3% dst/copy/rearranged/
909+
26.3% dst/copy/unchanged/
910+
5.2% dst/move/changed/
911+
5.2% dst/move/rearranged/
912+
5.2% rearranged/
913+
EOF
914+
915+
cat <<EOF >expect_diff_dirstat_CC
916+
16.6% changed/
917+
16.6% dst/copy/changed/
918+
16.6% dst/copy/rearranged/
919+
16.6% dst/move/changed/
920+
16.6% dst/move/rearranged/
921+
16.6% rearranged/
922+
EOF
923+
924+
test_expect_success '--dirstat=lines,0' '
925+
git diff --dirstat=lines,0 HEAD^..HEAD >actual_diff_dirstat &&
926+
test_cmp expect_diff_dirstat actual_diff_dirstat &&
927+
git diff --dirstat=lines,0 -M HEAD^..HEAD >actual_diff_dirstat_M &&
928+
test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
929+
git diff --dirstat=lines,0 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
930+
test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
931+
'
932+
933+
test_expect_success 'diff.dirstat=0,lines' '
934+
git -c diff.dirstat=0,lines diff --dirstat HEAD^..HEAD >actual_diff_dirstat &&
935+
test_cmp expect_diff_dirstat actual_diff_dirstat &&
936+
git -c diff.dirstat=0,lines diff --dirstat -M HEAD^..HEAD >actual_diff_dirstat_M &&
937+
test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
938+
git -c diff.dirstat=0,lines diff --dirstat -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
939+
test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
940+
'
941+
842942
test_done

0 commit comments

Comments
 (0)