Skip to content

Pod::Simple::XHTML: better fallback when HTML::Entities isn't installed#189

Merged
khwilliamson merged 1 commit intoperl-pod:masterfrom
xenu:xenu/entities
Feb 8, 2026
Merged

Pod::Simple::XHTML: better fallback when HTML::Entities isn't installed#189
khwilliamson merged 1 commit intoperl-pod:masterfrom
xenu:xenu/entities

Conversation

@xenu
Copy link
Copy Markdown
Contributor

@xenu xenu commented Jun 17, 2025

This commit changes the default set of escaped characters in the fallback code to be the same as in HTML::Entities.

Fixes #188

@xenu
Copy link
Copy Markdown
Contributor Author

xenu commented Jun 17, 2025

Added a test for this. Probably won't work under Perl 5.6, too bad.

@xenu xenu force-pushed the xenu/entities branch 2 times, most recently from 71bb8b9 to c05b941 Compare June 18, 2025 01:37
@xenu
Copy link
Copy Markdown
Contributor Author

xenu commented Jun 18, 2025

Fixed tests, they failed on most CI configurations, because I forgot to update the number of skipped tests in the case when HTML::Entities isn't installed.

@khwilliamson
Copy link
Copy Markdown
Contributor

I would like the test to work on EBCDIC. Fortunately this is trivial to do. Just change the non-ASCII character to B6, and change the supporting text to correspond. One possibility would be. "The pilcrow, \xb6, is used to mark the beginning of a new paragraph" B6 is the only non-ASCII character that has the same meaning in Latin1
and EBCDIC

This commit changes the default set of escaped characters in the
fallback code to be the same as in HTML::Entities.

Fixes perl-pod#188
@khwilliamson
Copy link
Copy Markdown
Contributor

I went searching, and discovered HTML5::Entities which knows many more definitions than plain HTML::Entities. Should that be the default, with the plain being the first fallback?

I also found Pod::Escapes, which ships with core since 5.12,, and automatically handles EBCDIC back to 5.7.3 which is good enough, and is available back to as far as we would ever need. Should this be another fallback?

@xenu
Copy link
Copy Markdown
Contributor Author

xenu commented Jan 30, 2026

I went searching, and discovered HTML5::Entities which knows many more definitions than plain HTML::Entities. Should that be the default, with the plain being the first fallback?

I'm not a fan of adding new dependencies to this module, I don't think it's worth it.

Honestly, I really don't think entity handling in Pod::Simple::XHTML has to be super good. For encoding, it doesn't make any difference if we output † or †, both work just as well. For decoding, it's more important to know the proper entity names, but decoding is used only in what I reckon is a rather obscure part of the code (guts of literal HTML handling).

I also found Pod::Escapes, which ships with core since 5.12,, and automatically handles EBCDIC back to 5.7.3 which is good enough, and is available back to as far as we would ever need. Should this be another fallback?

It's not a perfect fit, it doesn't provide a function equivalent to encode_entities of HTML::Entities. You could kinda use it to roll your own, by reversing the %Name2character_number hash it provides and then removing the non-standard entities it includes (specifically: "lchevron", "rchevron"). It feels icky, especially the last step. And like before, I'm not sure if it's worth it.

@khwilliamson khwilliamson merged commit a597458 into perl-pod:master Feb 8, 2026
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Pod::Simple::XHTML doesn't escape non-ASCII characters when HTML::Entities isn't installed

2 participants