Skip to content

Commit 6fe1c99

Browse files
authored
rfctr(html): prepare for new html parser (#3257)
**Summary** Extract as much mechanical refactoring from the HTML parser change-over into the PR as possible. This leaves the next PR focused on installing the new parser and the ingest-test impact. **Reviewers:** Commits are well groomed and reviewing commit-by-commit is probably easier. **Additional Context** This PR introduces the rewritten HTML parser. Its general design is recursive, consistent with the recursive structure of HTML (tree of elements). It also adds the unit tests for that parser but it does not _install_ the parser. So the behavior of `partition_html()` is unchanged by this PR. The next PR in this series will do that and handle the ingest and other unit test changes required to reflect the dozen or so bug-fixes the new parser provides.
1 parent e1b7553 commit 6fe1c99

File tree

20 files changed

+2769
-819
lines changed

20 files changed

+2769
-819
lines changed

CHANGELOG.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
## 0.14.8-dev0
1+
## 0.14.8-dev1
22

33
### Enhancements
44

Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -317,7 +317,7 @@ test-no-extras:
317317
UNSTRUCTURED_INCLUDE_DEBUG_METADATA=$(UNSTRUCTURED_INCLUDE_DEBUG_METADATA) pytest \
318318
test_${PACKAGE_NAME}/partition/test_text.py \
319319
test_${PACKAGE_NAME}/partition/test_email.py \
320-
test_${PACKAGE_NAME}/partition/test_html.py \
320+
test_${PACKAGE_NAME}/partition/html/test_partition.py \
321321
test_${PACKAGE_NAME}/partition/test_xml_partition.py
322322

323323
.PHONY: test-extra-csv

0 commit comments

Comments
 (0)