Commit fcb681a
Fix full HTML document parsing with HTML entities
Detect full HTML documents (starting with <!doctype or <html) and parse
them with lxml.html.fromstring() instead of fragment_fromstring(). This
preserves document structure for XHTML files with HTML entities like
that fail strict XML parsing.
Fragment parsing with span wrapper is still used for actual fragments to
maintain platform consistency fix from ea79811.
Fixes EPUB parsing where XHTML documents were being wrapped in span,
breaking body element detection.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <[email protected]>1 parent 3d6a8c7 commit fcb681a
1 file changed
+17
-5
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
523 | 523 | | |
524 | 524 | | |
525 | 525 | | |
526 | | - | |
527 | | - | |
528 | | - | |
529 | | - | |
530 | | - | |
| 526 | + | |
| 527 | + | |
| 528 | + | |
| 529 | + | |
| 530 | + | |
| 531 | + | |
| 532 | + | |
| 533 | + | |
| 534 | + | |
| 535 | + | |
| 536 | + | |
| 537 | + | |
| 538 | + | |
| 539 | + | |
| 540 | + | |
| 541 | + | |
| 542 | + | |
531 | 543 | | |
532 | 544 | | |
533 | 545 | | |
| |||
0 commit comments