Commit 296711b
authored
Support character references for &, <, >, ' and " (#5)
Polyglot HTML 5 markup (i. e. HTML 5 written in a way to be valid XML) only uses very few named entity references:
> Polyglot markup uses only the following named entity references:
> amp lt gt apos quot
https://www.w3.org/TR/html-polyglot/#named-entity-references
To support working with content that has been created before HTML5 – that is, XHTML1 – we substitute all named and character references with their plain values, which should not pose a problem in UTF-8 content. Only `&`, `<`, `>`, `"` and `'` shall be kept.
We missed, however, that e. g. `&` can also be written as `&`; similar for the other characters. This PR adds support for these cases as well.1 parent b7308a3 commit 296711b
1 file changed
+6
-3
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
23 | | - | |
24 | | - | |
| 23 | + | |
| 24 | + | |
25 | 25 | | |
26 | 26 | | |
27 | | - | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
28 | 31 | | |
29 | 32 | | |
30 | 33 | | |
| |||
0 commit comments