Skip to content

Commit b13e3ea

Browse files
jnarebgitster
authored andcommitted
gitweb: Fix fallback mode of to_utf8 subroutine
e5d3de5 (gitweb: use Perl built-in utf8 function for UTF-8 decoding., 2007-12-04) was meant to make gitweb faster by using Perl's internals (see subsection "Messing with Perl's Internals" in Encode(3pm) manpage) Simple benchmark confirms that (old = 00f429a, new = this version): old new old -- -65% new 189% -- Unfortunately it made fallback mode of to_utf8 do not work... except for default value 'latin1' of $fallback_encoding ('latin1' is Perl native encoding), which is why it was not noticed for such long time. utf8::valid(STRING) is an internal function that tests whether STRING is in a _consistent state_ regarding UTF-8. It returns true is well-formed UTF-8 and has the UTF-8 flag on _*or*_ if string is held as bytes (both these states are 'consistent'). For gitweb the second option was true, as output from git commands is opened without ':utf8' layer. What made it work at all for STRING in 'latin1' encoding is the fact that utf8:decode(STRING) turns on UTF-8 flag only if source string is valid UTF-8 and contains multi-byte UTF-8 characters... and that if string doesn't have UTF-8 flag set it is treated as in native Perl encoding, i.e. 'latin1' / 'iso-8859-1' (unless native encoding it is EBCDIC ;-)). It was ':utf8' layer that actually converted 'latin1' (no UTF-8 flag == native == 'latin1) to 'utf8'. Let's make use of the fact that utf8:decode(STRING) returns false if STRING is invalid as UTF-8 to check whether to enable fallback mode. Signed-off-by: Jakub Narebski <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent 57cf4ad commit b13e3ea

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

gitweb/gitweb.perl

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1442,8 +1442,8 @@ sub validate_refname {
14421442
sub to_utf8 {
14431443
my $str = shift;
14441444
return undef unless defined $str;
1445-
if (utf8::valid($str)) {
1446-
utf8::decode($str);
1445+
1446+
if (utf8::is_utf8($str) || utf8::decode($str)) {
14471447
return $str;
14481448
} else {
14491449
return decode($fallback_encoding, $str, Encode::FB_DEFAULT);

0 commit comments

Comments
 (0)