Skip to content

Introducing the ability to skip empty text nodes in LexborNode!#187

Merged
rushter merged 13 commits intorushter:masterfrom
pygarap:skip_empty_tags
Nov 20, 2025
Merged

Introducing the ability to skip empty text nodes in LexborNode!#187
rushter merged 13 commits intorushter:masterfrom
pygarap:skip_empty_tags

Conversation

@pygarap
Copy link
Contributor

@pygarap pygarap commented Nov 20, 2025

This pull request enhances the LexborNode API in selectolax by introducing the ability to skip empty text nodes in traversal, iter, text, and by adding a property to check for empty text nodes. These changes improve control over HTML parsing, especially when handling whitespace or empty nodes. The update also includes expanded documentation and tests to ensure the new behaviors are correct.

API Enhancements for Skipping Empty Text Nodes:

  • Added a skip_empty parameter to the text, iter, and traverse methods in LexborNode, allowing users to exclude empty text nodes (as determined by lxb_dom_node_is_empty) from results. [1] [2] [3] [4] [5] [6] [7] [8] [9]
  • Updated the C extension interface to expose the lxb_dom_node_is_empty function for use in Python code.

New Property for Node Emptiness:

  • Introduced the is_empty_text_node property on LexborNode, providing a convenient way to check if a node is a text node and considered empty by the underlying DOM implementation. [1] [2]

Documentation and Usability Improvements:

  • Expanded docstrings for text, iter, and traverse methods to clearly describe new parameters and behaviors, improving developer understanding and usability. [1] [2] [3] [4] [5] [6]
  • Added __iter__ and __next__ methods for better iterator protocol support.

Testing:

  • Added new tests to verify the correct behavior of the skip_empty flag in text, iter, and traverse methods, and to check the is_empty_text_node property.

…eter; update docstrings accordingly. Add `is_empty_text_node` property.
…or in `text`, `iter`, and `traverse` methods
…se` methods; adjust logic and docstrings accordingly
… inspection; remove unused `Iterator` import
…natures in type stubs for consistency; add minor formatting adjustments
…er; update related methods and tests for consistency.
… add minor formatting adjustments in `node.pxi`
@rushter rushter merged commit 9d27a52 into rushter:master Nov 20, 2025
7 checks passed
@rushter
Copy link
Owner

rushter commented Nov 20, 2025

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants