Fix UTF-8 mojibake in #!vim highlighted blocks#663
Fix UTF-8 mojibake in #!vim highlighted blocks#663mateu wants to merge 2 commits intoperladvent:mainfrom
Conversation
|
@mateu could we include a diff that shows the change from the CPAN version of |
|
Good call. Here’s the focused delta from the CPAN The main behavioral change is in sub build_html {
my ($self, $str, $param) = @_;
- my $octets = Encode::encode('utf-8', $str, Encode::FB_CROAK);
+ if (utf8::is_utf8($str)) {
+ if ($str =~ /[\x{00C2}\x{00C3}][\x{0080}-\x{00BF}]/) {
+ my $fixed = eval {
+ Encode::decode(
+ 'utf-8',
+ Encode::encode('latin-1', $str, Encode::FB_CROAK),
+ Encode::FB_CROAK
+ );
+ };
+ $str = $fixed if defined $fixed;
+ }
+ } else {
+ $str = Encode::decode('utf-8', $str, Encode::FB_CROAK);
+ }
my $vim = Text::VimColor->new(
- string => $octets,
+ string => $str,
filetype => $param->{filetype},
vim_options => [
qw( -RXZ -i NONE -u NONE -N -n ), "+set nomodeline", '+set fenc=utf-8',
],
);
- my $html_bytes = $vim->html;
- my $html = Encode::decode('utf-8', $html_bytes);
+ my $html = $vim->html;
+ $html = Encode::decode('utf-8', $html, Encode::FB_CROAK)
+ unless utf8::is_utf8($html);
return $html;
}In other words:
That change is what addresses the mojibake we were seeing in I’m not sure yet whether the best long-term home for the fix is here or upstream, so for now I’m trying to keep the change narrowly focused on making Perl Advent’s build output correct and covered by regression tests. |
|
@oalders what do you think about diff ^^ Is it sane? |
Code reviewNo blocking issues found. Checked for bugs and CLAUDE.md compliance. Minor note (non-blocking): |
|
Design: the fix belongs in the existing submodule fork, not a new The fix itself is correct — the double-encoding detection and unwrap in
What should happen here is option 1: add the corrected The To be clear about priority:
🤖 Written by Claude Code |
|
Quick cleanup note because my previous comment got shell-mangled. The intended update is:
Companion submodule PR: That should make the submodule-side review much easier. |
What: Fix UTF-8 handling for
#!vimhighlighted code blocks by overriding VimHTML conversion and wiring build-time Perl library precedence.Why: Issue #530 reports guillemets rendering as mojibake (
«...») in#!vim perloutput while normal=begin perlblocks render correctly.How: Added a repo-local
Pod::Elemental::Transformer::VimHTMLthat normalizes incoming text before callingText::VimColor(no pre-encoding to octets), and updatedscript/build-site.shto exportPERL5LIBso this override is used byadvcalbuilds. Addedt/vim_utf8_encoding.tto assert guillemet correctness in both code paths and guard against regression.Testing: Ran
docker compose run --rm perl-advent prove -lr tand verified generated2024-12-15.htmlin-container no longer contains«/»while preserving«...».Closes #530
Quality Report
Changes: 3 files changed, 128 insertions(+)
Code scan: clean
Tests: skipped
Branch hygiene: 1 issue(s)
Generated by Kōan post-mission quality pipeline