Skip to content

Commit 288613d

Browse files
(text-splitters): Small Fix in _process_html for HTMLSemanticPreservingSplitter to properly extract the metadata. (#29215)
- **Description:** Include `main` in the list of elements whose child elements needs to be processed for splitting the HTML. - **Issue:** #29184
1 parent 4867fe7 commit 288613d

File tree

1 file changed

+1
-1
lines changed
  • libs/text-splitters/langchain_text_splitters

1 file changed

+1
-1
lines changed

libs/text-splitters/langchain_text_splitters/html.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -696,7 +696,7 @@ def _process_element(
696696
placeholder_count: int,
697697
) -> Tuple[List[Document], Dict[str, str], List[str], Dict[str, str], int]:
698698
for elem in element:
699-
if elem.name.lower() in ["html", "body", "div"]:
699+
if elem.name.lower() in ["html", "body", "div", "main"]:
700700
children = elem.find_all(recursive=False)
701701
(
702702
documents,

0 commit comments

Comments
 (0)