Skip to content

Commit 0866786

Browse files
jnarebgitster
authored andcommitted
gitweb: Strip non-printable characters from syntax highlighter output
The current code, as is, passes control characters, such as form-feed (^L) to highlight which then passes it through to the browser. User agents (web browsers) that support 'application/xhtml+xml' usually require that web pages declared as XHTML and with this mimetype are well-formed XML. Unescaped control characters cannot appear within a contents of a valid XML document. This will cause the browser to display one of the following warnings: * Safari v5.1 (6534.50) & Google Chrome v13.0.782.112: This page contains the following errors: error on line 657 at column 38: PCDATA invalid Char value 12 Below is a rendering of the page up to the first error. * Mozilla Firefox 3.6.19 & Mozilla Firefox 5.0: XML Parsing Error: not well-formed Location: http://path/to/git/repo/blah/blah Both errors were generated by gitweb.perl v1.7.3.4 w/ highlight 2.7 using arch/ia64/kernel/unwind.c from the Linux kernel. When syntax highlighter is not used, control characters are replaced by esc_html(), but with syntax highlighter they were passed through to browser (to_utf8() doesn't remove control characters). Introduce sanitize() subroutine which strips forbidden characters, but does not perform HTML escaping, and use it in git_blob() to sanitize syntax highlighter output for XHTML. Note that excluding "\t" (U+0009), "\n" (U+000A) and "\r" (U+000D) is not strictly necessary, atleast for currently the only callsite: "\t" tabs are replaced by spaces by untabify(), "\n" is stripped from each line before processing it, and replacing "\r" could be considered improvement. Originally-by: Christopher M. Fuhrman <[email protected]> Signed-off-by: Jakub Narebski <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent 5738c9c commit 0866786

File tree

1 file changed

+13
-1
lines changed

1 file changed

+13
-1
lines changed

gitweb/gitweb.perl

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1517,6 +1517,17 @@ sub esc_path {
15171517
return $str;
15181518
}
15191519

1520+
# Sanitize for use in XHTML + application/xml+xhtm (valid XML 1.0)
1521+
sub sanitize {
1522+
my $str = shift;
1523+
1524+
return undef unless defined $str;
1525+
1526+
$str = to_utf8($str);
1527+
$str =~ s|([[:cntrl:]])|($1 =~ /[\t\n\r]/ ? $1 : quot_cec($1))|eg;
1528+
return $str;
1529+
}
1530+
15201531
# Make control characters "printable", using character escape codes (CEC)
15211532
sub quot_cec {
15221533
my $cntrl = shift;
@@ -6484,7 +6495,8 @@ sub git_blob {
64846495
$nr++;
64856496
$line = untabify($line);
64866497
printf qq!<div class="pre"><a id="l%i" href="%s#l%i" class="linenr">%4i</a> %s</div>\n!,
6487-
$nr, esc_attr(href(-replay => 1)), $nr, $nr, $syntax ? to_utf8($line) : esc_html($line, -nbsp=>1);
6498+
$nr, esc_attr(href(-replay => 1)), $nr, $nr,
6499+
$syntax ? sanitize($line) : esc_html($line, -nbsp=>1);
64886500
}
64896501
}
64906502
close $fd

0 commit comments

Comments
 (0)