Skip to content

Commit 12fc4ad

Browse files
tboegigitster
authored andcommitted
diff.c: use utf8_strwidth() to count display width
When unicode filenames (encoded in UTF-8) are used, the visible width on the screen is not the same as strlen(). For example, `git log --stat` may produce an output like this: [snip the header] Arger.txt | 1 + Ärger.txt | 1 + 2 files changed, 2 insertions(+) A side note: the original report was about cyrillic filenames. After some investigations it turned out that a) This is not a problem with "ambiguous characters" in unicode b) The same problem exists for all unicode code points (so we can use Latin based Umlauts for demonstrations below) The 'Ä' takes the same space on the screen as the 'A'. But needs one more byte in memory, so the the `git log --stat` output for "Arger.txt" (!) gets mis-aligned: The maximum length is derived from "Ärger.txt", 10 bytes in memory, 9 positions on the screen. That is why "Arger.txt" gets one extra ' ' for aligment, it needs 9 bytes in memory. If there was a file "Ö", it would be correctly aligned by chance, but "Öhö" would not. The solution is of course, to use utf8_strwidth() instead of strlen() when dealing with the width on screen. And then there is another problem, code like this: strbuf_addf(&out, "%-*s", len, name); (or using the underlying snprintf() function) does not align the buffer to a minimum of len measured in screen-width, but uses the memory count. One could be tempted to wish that snprintf() was UTF-8 aware. That doesn't seem to be the case anywhere (tested on Linux and Mac), probably snprintf() uses the "bytes in memory"/strlen() approach to be compatible with older versions and this will never change. The basic idea is to change code in diff.c like this strbuf_addf(&out, "%-*s", len, name); into something like this: int padding = len - utf8_strwidth(name); if (padding < 0) padding = 0; strbuf_addf(&out, " %s%*s", name, padding, ""); The real change is slighty bigger, as it, as well, integrates two calls of strbuf_addf() into one. Tests: Two things need to be tested: - The calculation of the maximum width - The calculation of padding The name "textfile" is changed into "tëxtfilë", both have a width of 8. If strlen() was used, to get the maximum width, the shorter "binfile" would have been mis-aligned: binfile | [snip] tëxtfilë | [snip] If only "binfile" would be renamed into "binfilë": binfilë | [snip] textfile | [snip] In order to verify that the width is calculated correctly everywhere, "binfile" is renamed into "binfilë", giving 1 bytes more in strlen() "tëxtfile" is renamed into "tëxtfilë", 2 byte more in strlen(). The updated t4012-diff-binary.sh checks the correct aligment: binfilë | [snip] tëxtfilë | [snip] Reported-by: Alexander Meshcheryakov <[email protected]> Helped-by: Johannes Schindelin <[email protected]> Signed-off-by: Torsten Bögershausen <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent fd59c5b commit 12fc4ad

File tree

2 files changed

+23
-18
lines changed

2 files changed

+23
-18
lines changed

diff.c

Lines changed: 16 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -2622,7 +2622,7 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options)
26222622
continue;
26232623
}
26242624
fill_print_name(file);
2625-
len = strlen(file->print_name);
2625+
len = utf8_strwidth(file->print_name);
26262626
if (max_len < len)
26272627
max_len = len;
26282628

@@ -2736,7 +2736,7 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options)
27362736
char *name = file->print_name;
27372737
uintmax_t added = file->added;
27382738
uintmax_t deleted = file->deleted;
2739-
int name_len;
2739+
int name_len, padding;
27402740

27412741
if (!file->is_interesting && (added + deleted == 0))
27422742
continue;
@@ -2745,7 +2745,7 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options)
27452745
* "scale" the filename
27462746
*/
27472747
len = name_width;
2748-
name_len = strlen(name);
2748+
name_len = utf8_strwidth(name);
27492749
if (name_width < name_len) {
27502750
char *slash;
27512751
prefix = "...";
@@ -2755,10 +2755,14 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options)
27552755
if (slash)
27562756
name = slash;
27572757
}
2758+
padding = len - utf8_strwidth(name);
2759+
if (padding < 0)
2760+
padding = 0;
27582761

27592762
if (file->is_binary) {
2760-
strbuf_addf(&out, " %s%-*s |", prefix, len, name);
2761-
strbuf_addf(&out, " %*s", number_width, "Bin");
2763+
strbuf_addf(&out, " %s%s%*s | %*s",
2764+
prefix, name, padding, "",
2765+
number_width, "Bin");
27622766
if (!added && !deleted) {
27632767
strbuf_addch(&out, '\n');
27642768
emit_diff_symbol(options, DIFF_SYMBOL_STATS_LINE,
@@ -2778,8 +2782,9 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options)
27782782
continue;
27792783
}
27802784
else if (file->is_unmerged) {
2781-
strbuf_addf(&out, " %s%-*s |", prefix, len, name);
2782-
strbuf_addstr(&out, " Unmerged\n");
2785+
strbuf_addf(&out, " %s%s%*s | %*s",
2786+
prefix, name, padding, "",
2787+
number_width, "Unmerged");
27832788
emit_diff_symbol(options, DIFF_SYMBOL_STATS_LINE,
27842789
out.buf, out.len, 0);
27852790
strbuf_reset(&out);
@@ -2805,10 +2810,10 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options)
28052810
add = total - del;
28062811
}
28072812
}
2808-
strbuf_addf(&out, " %s%-*s |", prefix, len, name);
2809-
strbuf_addf(&out, " %*"PRIuMAX"%s",
2810-
number_width, added + deleted,
2811-
added + deleted ? " " : "");
2813+
strbuf_addf(&out, " %s%s%*s | %*"PRIuMAX"%s",
2814+
prefix, name, padding, "",
2815+
number_width, added + deleted,
2816+
added + deleted ? " " : "");
28122817
show_graph(&out, '+', add, add_c, reset);
28132818
show_graph(&out, '-', del, del_c, reset);
28142819
strbuf_addch(&out, '\n');

t/t4012-diff-binary.sh

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -113,20 +113,20 @@ test_expect_success 'diff --no-index with binary creation' '
113113
'
114114

115115
cat >expect <<EOF
116-
binfile | Bin 0 -> 1026 bytes
117-
textfile | 10000 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
116+
binfilë | Bin 0 -> 1026 bytes
117+
tëxtfilë | 10000 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
118118
EOF
119119

120120
test_expect_success 'diff --stat with binary files and big change count' '
121-
printf "\01\00%1024d" 1 >binfile &&
122-
git add binfile &&
121+
printf "\01\00%1024d" 1 >binfilë &&
122+
git add binfilë &&
123123
i=0 &&
124124
while test $i -lt 10000; do
125125
echo $i &&
126126
i=$(($i + 1)) || return 1
127-
done >textfile &&
128-
git add textfile &&
129-
git diff --cached --stat binfile textfile >output &&
127+
done >tëxtfilë &&
128+
git add tëxtfilë &&
129+
git -c core.quotepath=false diff --cached --stat binfilë tëxtfilë >output &&
130130
grep " | " output >actual &&
131131
test_cmp expect actual
132132
'

0 commit comments

Comments
 (0)