Commit ea79811
Fix platform-specific HTML parsing inconsistencies
Use lxml.html.fragment_fromstring with explicit 'span' parent to ensure
consistent parsing behavior across platforms. lxml.html.fromstring() has
unpredictable auto-correction that wraps fragments differently on different
platforms/libxml2 versions, causing CI failures.
Changes:
- tree_from_string() now uses fragment_fromstring with create_parent='span'
- Added explicit empty/whitespace check to maintain ParserError behavior
- Updated test expectations to match consistent parsing behavior
- Removed debug test file
Fixes #CI
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <[email protected]>1 parent ec9c326 commit ea79811
File tree
4 files changed
+14
-53
lines changed- tests
4 files changed
+14
-53
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
415 | 415 | | |
416 | 416 | | |
417 | 417 | | |
418 | | - | |
| 418 | + | |
| 419 | + | |
| 420 | + | |
| 421 | + | |
| 422 | + | |
| 423 | + | |
| 424 | + | |
| 425 | + | |
419 | 426 | | |
420 | 427 | | |
421 | 428 | | |
| |||
This file was deleted.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
40 | 40 | | |
41 | 41 | | |
42 | 42 | | |
43 | | - | |
44 | | - | |
45 | | - | |
46 | | - | |
47 | | - | |
| 43 | + | |
| 44 | + | |
48 | 45 | | |
49 | 46 | | |
50 | 47 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
64 | 64 | | |
65 | 65 | | |
66 | 66 | | |
67 | | - | |
68 | | - | |
| 67 | + | |
| 68 | + | |
69 | 69 | | |
70 | 70 | | |
71 | 71 | | |
| |||
134 | 134 | | |
135 | 135 | | |
136 | 136 | | |
137 | | - | |
138 | | - | |
| 137 | + | |
| 138 | + | |
139 | 139 | | |
140 | 140 | | |
141 | 141 | | |
| |||
0 commit comments