Skip to content

Commit df54e2b

Browse files
committed
Merge branch 'jh/dirstat-lines'
* jh/dirstat-lines: Mark dirstat error messages for translation Improve error handling when parsing dirstat parameters New --dirstat=lines mode, doing dirstat analysis based on diffstat Allow specifying --dirstat cut-off percentage as a floating point number Add config variable for specifying default --dirstat behavior Refactor --dirstat parsing; deprecate --cumulative and --dirstat-by-file Make --dirstat=0 output directories that contribute < 0.1% of changes Add several testcases for --dirstat and friends
2 parents f7d59e2 + 7478ac5 commit df54e2b

File tree

5 files changed

+1219
-32
lines changed

5 files changed

+1219
-32
lines changed

Documentation/diff-config.txt

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,50 @@ diff.autorefreshindex::
88
affects only 'git diff' Porcelain, and not lower level
99
'diff' commands such as 'git diff-files'.
1010

11+
diff.dirstat::
12+
A comma separated list of `--dirstat` parameters specifying the
13+
default behavior of the `--dirstat` option to linkgit:git-diff[1]`
14+
and friends. The defaults can be overridden on the command line
15+
(using `--dirstat=<param1,param2,...>`). The fallback defaults
16+
(when not changed by `diff.dirstat`) are `changes,noncumulative,3`.
17+
The following parameters are available:
18+
+
19+
--
20+
`changes`;;
21+
Compute the dirstat numbers by counting the lines that have been
22+
removed from the source, or added to the destination. This ignores
23+
the amount of pure code movements within a file. In other words,
24+
rearranging lines in a file is not counted as much as other changes.
25+
This is the default behavior when no parameter is given.
26+
`lines`;;
27+
Compute the dirstat numbers by doing the regular line-based diff
28+
analysis, and summing the removed/added line counts. (For binary
29+
files, count 64-byte chunks instead, since binary files have no
30+
natural concept of lines). This is a more expensive `--dirstat`
31+
behavior than the `changes` behavior, but it does count rearranged
32+
lines within a file as much as other changes. The resulting output
33+
is consistent with what you get from the other `--*stat` options.
34+
`files`;;
35+
Compute the dirstat numbers by counting the number of files changed.
36+
Each changed file counts equally in the dirstat analysis. This is
37+
the computationally cheapest `--dirstat` behavior, since it does
38+
not have to look at the file contents at all.
39+
`cumulative`;;
40+
Count changes in a child directory for the parent directory as well.
41+
Note that when using `cumulative`, the sum of the percentages
42+
reported may exceed 100%. The default (non-cumulative) behavior can
43+
be specified with the `noncumulative` parameter.
44+
<limit>;;
45+
An integer parameter specifies a cut-off percent (3% by default).
46+
Directories contributing less than this percentage of the changes
47+
are not shown in the output.
48+
--
49+
+
50+
Example: The following will count changed files, while ignoring
51+
directories with less than 10% of the total amount of changed files,
52+
and accumulating child directory counts in the parent directories:
53+
`files,10,cumulative`.
54+
1155
diff.external::
1256
If this config variable is set, diff generation is not
1357
performed using the internal diff machinery, but using the

Documentation/diff-options.txt

Lines changed: 42 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -66,19 +66,49 @@ endif::git-format-patch[]
6666
number of modified files, as well as number of added and deleted
6767
lines.
6868

69-
--dirstat[=<limit>]::
70-
Output the distribution of relative amount of changes (number of lines added or
71-
removed) for each sub-directory. Directories with changes below
72-
a cut-off percent (3% by default) are not shown. The cut-off percent
73-
can be set with `--dirstat=<limit>`. Changes in a child directory are not
74-
counted for the parent directory, unless `--cumulative` is used.
69+
--dirstat[=<param1,param2,...>]::
70+
Output the distribution of relative amount of changes for each
71+
sub-directory. The behavior of `--dirstat` can be customized by
72+
passing it a comma separated list of parameters.
73+
The defaults are controlled by the `diff.dirstat` configuration
74+
variable (see linkgit:git-config[1]).
75+
The following parameters are available:
7576
+
76-
Note that the `--dirstat` option computes the changes while ignoring
77-
the amount of pure code movements within a file. In other words,
78-
rearranging lines in a file is not counted as much as other changes.
79-
80-
--dirstat-by-file[=<limit>]::
81-
Same as `--dirstat`, but counts changed files instead of lines.
77+
--
78+
`changes`;;
79+
Compute the dirstat numbers by counting the lines that have been
80+
removed from the source, or added to the destination. This ignores
81+
the amount of pure code movements within a file. In other words,
82+
rearranging lines in a file is not counted as much as other changes.
83+
This is the default behavior when no parameter is given.
84+
`lines`;;
85+
Compute the dirstat numbers by doing the regular line-based diff
86+
analysis, and summing the removed/added line counts. (For binary
87+
files, count 64-byte chunks instead, since binary files have no
88+
natural concept of lines). This is a more expensive `--dirstat`
89+
behavior than the `changes` behavior, but it does count rearranged
90+
lines within a file as much as other changes. The resulting output
91+
is consistent with what you get from the other `--*stat` options.
92+
`files`;;
93+
Compute the dirstat numbers by counting the number of files changed.
94+
Each changed file counts equally in the dirstat analysis. This is
95+
the computationally cheapest `--dirstat` behavior, since it does
96+
not have to look at the file contents at all.
97+
`cumulative`;;
98+
Count changes in a child directory for the parent directory as well.
99+
Note that when using `cumulative`, the sum of the percentages
100+
reported may exceed 100%. The default (non-cumulative) behavior can
101+
be specified with the `noncumulative` parameter.
102+
<limit>;;
103+
An integer parameter specifies a cut-off percent (3% by default).
104+
Directories contributing less than this percentage of the changes
105+
are not shown in the output.
106+
--
107+
+
108+
Example: The following will count changed files, while ignoring
109+
directories with less than 10% of the total amount of changed files,
110+
and accumulating child directory counts in the parent directories:
111+
`--dirstat=files,10,cumulative`.
82112

83113
--summary::
84114
Output a condensed summary of extended header information

diff.c

Lines changed: 152 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@ static const char *external_diff_cmd_cfg;
3131
int diff_auto_refresh_index = 1;
3232
static int diff_mnemonic_prefix;
3333
static int diff_no_prefix;
34+
static int diff_dirstat_permille_default = 30;
3435
static struct diff_options default_diff_options;
3536

3637
static char diff_colors[][COLOR_MAXLEN] = {
@@ -66,6 +67,58 @@ static int parse_diff_color_slot(const char *var, int ofs)
6667
return -1;
6768
}
6869

70+
static int parse_dirstat_params(struct diff_options *options, const char *params,
71+
struct strbuf *errmsg)
72+
{
73+
const char *p = params;
74+
int p_len, ret = 0;
75+
76+
while (*p) {
77+
p_len = strchrnul(p, ',') - p;
78+
if (!memcmp(p, "changes", p_len)) {
79+
DIFF_OPT_CLR(options, DIRSTAT_BY_LINE);
80+
DIFF_OPT_CLR(options, DIRSTAT_BY_FILE);
81+
} else if (!memcmp(p, "lines", p_len)) {
82+
DIFF_OPT_SET(options, DIRSTAT_BY_LINE);
83+
DIFF_OPT_CLR(options, DIRSTAT_BY_FILE);
84+
} else if (!memcmp(p, "files", p_len)) {
85+
DIFF_OPT_CLR(options, DIRSTAT_BY_LINE);
86+
DIFF_OPT_SET(options, DIRSTAT_BY_FILE);
87+
} else if (!memcmp(p, "noncumulative", p_len)) {
88+
DIFF_OPT_CLR(options, DIRSTAT_CUMULATIVE);
89+
} else if (!memcmp(p, "cumulative", p_len)) {
90+
DIFF_OPT_SET(options, DIRSTAT_CUMULATIVE);
91+
} else if (isdigit(*p)) {
92+
char *end;
93+
int permille = strtoul(p, &end, 10) * 10;
94+
if (*end == '.' && isdigit(*++end)) {
95+
/* only use first digit */
96+
permille += *end - '0';
97+
/* .. and ignore any further digits */
98+
while (isdigit(*++end))
99+
; /* nothing */
100+
}
101+
if (end - p == p_len)
102+
options->dirstat_permille = permille;
103+
else {
104+
strbuf_addf(errmsg, _(" Failed to parse dirstat cut-off percentage '%.*s'\n"),
105+
p_len, p);
106+
ret++;
107+
}
108+
} else {
109+
strbuf_addf(errmsg, _(" Unknown dirstat parameter '%.*s'\n"),
110+
p_len, p);
111+
ret++;
112+
}
113+
114+
p += p_len;
115+
116+
if (*p)
117+
p++; /* more parameters, swallow separator */
118+
}
119+
return ret;
120+
}
121+
69122
static int git_config_rename(const char *var, const char *value)
70123
{
71124
if (!value)
@@ -145,6 +198,17 @@ int git_diff_basic_config(const char *var, const char *value, void *cb)
145198
return 0;
146199
}
147200

201+
if (!strcmp(var, "diff.dirstat")) {
202+
struct strbuf errmsg = STRBUF_INIT;
203+
default_diff_options.dirstat_permille = diff_dirstat_permille_default;
204+
if (parse_dirstat_params(&default_diff_options, value, &errmsg))
205+
warning(_("Found errors in 'diff.dirstat' config variable:\n%s"),
206+
errmsg.buf);
207+
strbuf_release(&errmsg);
208+
diff_dirstat_permille_default = default_diff_options.dirstat_permille;
209+
return 0;
210+
}
211+
148212
if (!prefixcmp(var, "submodule."))
149213
return parse_submodule_config_option(var, value);
150214

@@ -1455,7 +1519,7 @@ struct dirstat_file {
14551519

14561520
struct dirstat_dir {
14571521
struct dirstat_file *files;
1458-
int alloc, nr, percent, cumulative;
1522+
int alloc, nr, permille, cumulative;
14591523
};
14601524

14611525
static long gather_dirstat(struct diff_options *opt, struct dirstat_dir *dir,
@@ -1502,12 +1566,11 @@ static long gather_dirstat(struct diff_options *opt, struct dirstat_dir *dir,
15021566
* under this directory (sources == 1).
15031567
*/
15041568
if (baselen && sources != 1) {
1505-
int permille = this_dir * 1000 / changed;
1506-
if (permille) {
1507-
int percent = permille / 10;
1508-
if (percent >= dir->percent) {
1569+
if (this_dir) {
1570+
int permille = this_dir * 1000 / changed;
1571+
if (permille >= dir->permille) {
15091572
fprintf(opt->file, "%s%4d.%01d%% %.*s\n", line_prefix,
1510-
percent, permille % 10, baselen, base);
1573+
permille / 10, permille % 10, baselen, base);
15111574
if (!dir->cumulative)
15121575
return 0;
15131576
}
@@ -1533,7 +1596,7 @@ static void show_dirstat(struct diff_options *options)
15331596
dir.files = NULL;
15341597
dir.alloc = 0;
15351598
dir.nr = 0;
1536-
dir.percent = options->dirstat_percent;
1599+
dir.permille = options->dirstat_permille;
15371600
dir.cumulative = DIFF_OPT_TST(options, DIRSTAT_CUMULATIVE);
15381601

15391602
changed = 0;
@@ -1622,6 +1685,50 @@ static void show_dirstat(struct diff_options *options)
16221685
gather_dirstat(options, &dir, changed, "", 0);
16231686
}
16241687

1688+
static void show_dirstat_by_line(struct diffstat_t *data, struct diff_options *options)
1689+
{
1690+
int i;
1691+
unsigned long changed;
1692+
struct dirstat_dir dir;
1693+
1694+
if (data->nr == 0)
1695+
return;
1696+
1697+
dir.files = NULL;
1698+
dir.alloc = 0;
1699+
dir.nr = 0;
1700+
dir.permille = options->dirstat_permille;
1701+
dir.cumulative = DIFF_OPT_TST(options, DIRSTAT_CUMULATIVE);
1702+
1703+
changed = 0;
1704+
for (i = 0; i < data->nr; i++) {
1705+
struct diffstat_file *file = data->files[i];
1706+
unsigned long damage = file->added + file->deleted;
1707+
if (file->is_binary)
1708+
/*
1709+
* binary files counts bytes, not lines. Must find some
1710+
* way to normalize binary bytes vs. textual lines.
1711+
* The following heuristic assumes that there are 64
1712+
* bytes per "line".
1713+
* This is stupid and ugly, but very cheap...
1714+
*/
1715+
damage = (damage + 63) / 64;
1716+
ALLOC_GROW(dir.files, dir.nr + 1, dir.alloc);
1717+
dir.files[dir.nr].name = file->name;
1718+
dir.files[dir.nr].changed = damage;
1719+
changed += damage;
1720+
dir.nr++;
1721+
}
1722+
1723+
/* This can happen even with many files, if everything was renames */
1724+
if (!changed)
1725+
return;
1726+
1727+
/* Show all directories with more than x% of the changes */
1728+
qsort(dir.files, dir.nr, sizeof(dir.files[0]), dirstat_compare);
1729+
gather_dirstat(options, &dir, changed, "", 0);
1730+
}
1731+
16251732
static void free_diffstat_info(struct diffstat_t *diffstat)
16261733
{
16271734
int i;
@@ -2891,7 +2998,7 @@ void diff_setup(struct diff_options *options)
28912998
options->line_termination = '\n';
28922999
options->break_opt = -1;
28933000
options->rename_limit = -1;
2894-
options->dirstat_percent = 3;
3001+
options->dirstat_permille = diff_dirstat_permille_default;
28953002
options->context = 3;
28963003

28973004
options->change = diff_change;
@@ -3149,6 +3256,21 @@ static int stat_opt(struct diff_options *options, const char **av)
31493256
return argcount;
31503257
}
31513258

3259+
static int parse_dirstat_opt(struct diff_options *options, const char *params)
3260+
{
3261+
struct strbuf errmsg = STRBUF_INIT;
3262+
if (parse_dirstat_params(options, params, &errmsg))
3263+
die(_("Failed to parse --dirstat/-X option parameter:\n%s"),
3264+
errmsg.buf);
3265+
strbuf_release(&errmsg);
3266+
/*
3267+
* The caller knows a dirstat-related option is given from the command
3268+
* line; allow it to say "return this_function();"
3269+
*/
3270+
options->output_format |= DIFF_FORMAT_DIRSTAT;
3271+
return 1;
3272+
}
3273+
31523274
int diff_opt_parse(struct diff_options *options, const char **av, int ac)
31533275
{
31543276
const char *arg = av[0];
@@ -3168,15 +3290,19 @@ int diff_opt_parse(struct diff_options *options, const char **av, int ac)
31683290
options->output_format |= DIFF_FORMAT_NUMSTAT;
31693291
else if (!strcmp(arg, "--shortstat"))
31703292
options->output_format |= DIFF_FORMAT_SHORTSTAT;
3171-
else if (opt_arg(arg, 'X', "dirstat", &options->dirstat_percent))
3172-
options->output_format |= DIFF_FORMAT_DIRSTAT;
3173-
else if (!strcmp(arg, "--cumulative")) {
3174-
options->output_format |= DIFF_FORMAT_DIRSTAT;
3175-
DIFF_OPT_SET(options, DIRSTAT_CUMULATIVE);
3176-
} else if (opt_arg(arg, 0, "dirstat-by-file",
3177-
&options->dirstat_percent)) {
3178-
options->output_format |= DIFF_FORMAT_DIRSTAT;
3179-
DIFF_OPT_SET(options, DIRSTAT_BY_FILE);
3293+
else if (!strcmp(arg, "-X") || !strcmp(arg, "--dirstat"))
3294+
return parse_dirstat_opt(options, "");
3295+
else if (!prefixcmp(arg, "-X"))
3296+
return parse_dirstat_opt(options, arg + 2);
3297+
else if (!prefixcmp(arg, "--dirstat="))
3298+
return parse_dirstat_opt(options, arg + 10);
3299+
else if (!strcmp(arg, "--cumulative"))
3300+
return parse_dirstat_opt(options, "cumulative");
3301+
else if (!strcmp(arg, "--dirstat-by-file"))
3302+
return parse_dirstat_opt(options, "files");
3303+
else if (!prefixcmp(arg, "--dirstat-by-file=")) {
3304+
parse_dirstat_opt(options, "files");
3305+
return parse_dirstat_opt(options, arg + 18);
31803306
}
31813307
else if (!strcmp(arg, "--check"))
31823308
options->output_format |= DIFF_FORMAT_CHECKDIFF;
@@ -4023,6 +4149,7 @@ void diff_flush(struct diff_options *options)
40234149
struct diff_queue_struct *q = &diff_queued_diff;
40244150
int i, output_format = options->output_format;
40254151
int separator = 0;
4152+
int dirstat_by_line = 0;
40264153

40274154
/*
40284155
* Order: raw, stat, summary, patch
@@ -4043,7 +4170,11 @@ void diff_flush(struct diff_options *options)
40434170
separator++;
40444171
}
40454172

4046-
if (output_format & (DIFF_FORMAT_DIFFSTAT|DIFF_FORMAT_SHORTSTAT|DIFF_FORMAT_NUMSTAT)) {
4173+
if (output_format & DIFF_FORMAT_DIRSTAT && DIFF_OPT_TST(options, DIRSTAT_BY_LINE))
4174+
dirstat_by_line = 1;
4175+
4176+
if (output_format & (DIFF_FORMAT_DIFFSTAT|DIFF_FORMAT_SHORTSTAT|DIFF_FORMAT_NUMSTAT) ||
4177+
dirstat_by_line) {
40474178
struct diffstat_t diffstat;
40484179

40494180
memset(&diffstat, 0, sizeof(struct diffstat_t));
@@ -4058,10 +4189,12 @@ void diff_flush(struct diff_options *options)
40584189
show_stats(&diffstat, options);
40594190
if (output_format & DIFF_FORMAT_SHORTSTAT)
40604191
show_shortstats(&diffstat, options);
4192+
if (output_format & DIFF_FORMAT_DIRSTAT)
4193+
show_dirstat_by_line(&diffstat, options);
40614194
free_diffstat_info(&diffstat);
40624195
separator++;
40634196
}
4064-
if (output_format & DIFF_FORMAT_DIRSTAT)
4197+
if ((output_format & DIFF_FORMAT_DIRSTAT) && !dirstat_by_line)
40654198
show_dirstat(options);
40664199

40674200
if (output_format & DIFF_FORMAT_SUMMARY && !is_summary_empty(q)) {

diff.h

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,7 @@ typedef struct strbuf *(*diff_prefix_fn_t)(struct diff_options *opt, void *data)
7878
#define DIFF_OPT_IGNORE_UNTRACKED_IN_SUBMODULES (1 << 25)
7979
#define DIFF_OPT_IGNORE_DIRTY_SUBMODULES (1 << 26)
8080
#define DIFF_OPT_OVERRIDE_SUBMODULE_CONFIG (1 << 27)
81+
#define DIFF_OPT_DIRSTAT_BY_LINE (1 << 28)
8182

8283
#define DIFF_OPT_TST(opts, flag) ((opts)->flags & DIFF_OPT_##flag)
8384
#define DIFF_OPT_SET(opts, flag) ((opts)->flags |= DIFF_OPT_##flag)
@@ -114,7 +115,7 @@ struct diff_options {
114115
int needed_rename_limit;
115116
int degraded_cc_to_c;
116117
int show_rename_progress;
117-
int dirstat_percent;
118+
int dirstat_permille;
118119
int setup;
119120
int abbrev;
120121
const char *prefix;

0 commit comments

Comments
 (0)