- Fix
.text()anditer()for HTML fragments when there are multiple nodes at the root level. Resolves #209. - Update lexbor. Resolves #212.
- Breaking changes: Empty tags are now serialized to
<div value="">instead of<div value>(Commit 4530fed). - Improve
unwrap_tagsandmerge_text_nodes.
- Fix HTML parsing in fragment parser for
LexborHTMLParser - Fix memory leak in fragment parser
- Improve
skip_emptyparameter for text methods - Add
comment_contentmethod - Minor performance optimizations
- Add
create_tagmethod toLexborHTMLParser - Fix advanced selector (
.select()) when attributes are empty.
- Broken release. Not published to PyPi.
-
Add
is_fragmentparameter toLexborHTMLParser@pygarap -
Add the ability to skip empty text nodes for lexbor backend to
.text,.iter,.traverse@pygarap -
Add new properties to lexbor backend:
is_element_node,is_text_node,is_comment_node,is_document_node. @pygarap -
Update
lexborlibrary
- Update
lexborlibrary - Fix missing description on PyPi.
- Broken release. Not published to PyPi.
- Fix parsing of CSS selectors that contain Unicode characters.
- Fix incorrect default value in docstrings for strict argument
- Fix incorrect exception handling for
any_css_matches - Fix docstring for
css_firstmethod - Fix memory leak in
merge_text_nodesfor lexbor backend - Update lexbor backend
- Add
.inner_htmlproperty. Allows to get and set inner HTML of a node. - Update various docstrings.
- Optimize performance for
css_firstin lexbor backend - Fix segfaults when accessing attributes. Resolves #135.
- Add new
.clonemethod to lexbor backend. Resolve #117. - Improve unicode handling for malformed text. Resolves #138.
- Fix segfaults when doing double
.decompose. Resolves #179. - Fix sefgaults when doing double
.unwrap. Resolves #169. - Fix typo for tag names. Clarify available tag names.
Released
- Lexbor backend now supports
:lexbor-contains("abc" i)CSS pseudo-class to match text nodes.
Released
- Add
merge_text_nodesto lexbor backend. Fixes #170. @amirshukayev - Performance improvements in Cython code. @Vizonex
Released
- Update lexbor. New version of lexbor fixes bugs with CSS selectors.
Released
- Improve type hints, add docstrings to type hints
- Prevent decomposing of the root node
- Unpin Cython version and make it Optional
- Allow empty attribute values. Fixes #165.
Released
- Update lexbor
- Expose
SelectolaxErrorexception in lexbor.pyi
Released
- Feat: Add unwrap empty tags functionality. Fixes #159.
Released
- Fix: Update lexbor and improve HTML serialization speed. Fixes #153.
- Fix: typo in type annotations. Fixes #147.
- Fix: Fix incorrect type annotations for
LexborHTMLParser.__init__. Fixes #144.
Released
- Fix: Header detected as head
Released
- Improve type hints
Released
- Feat: Add
parse_fragment()andcreate_tag() - Add missing typing for
Node.insert_child() - Add
Node.parserto access theHTMLParserto which the node belongs
Released
- Add
Node.insert_childmethod to lexbor and modest backends
Released
- Add Python 3.13 wheels
- Update lexbor
Released
- Breaking change:
lexborbackend now includes the root node when querying CSS selectors. Same asModestbackend. - Fix
css_matchesandany_css_matchesmethods forModestbackend on some compilers
Released
- Fixup for 0.3.19 release
- Fix tag order for
lexborbackend
Released
- Increase maximum HTML size to 2.4GB
Released
- Fix memory leak when using CSS selectors,
lexborbackend
Released
- Update lexbor
- Add Python 3.12 wheels
Released
- Make HTML nodes hashable
- Pin Cython version
Released
- Improve typing. Thanks to @nesb1
Released
- Fix memory leak for
lexborbackend
Released
- Update
lexbor
Released
- Update
lexbor - Add Python 3.11 wheels
Released
- Fix out-of-bounds bug for
merge_text_nodesmethod.
Released
This release does not contain any changes. Due to a typo in the version number (#70), we need to make a new release.
Released
- Remove trailing separator when using
text(deep=True, separator='x'). - Add a new
merge_text_nodesmethod for Modest backend.
Released
- Fix incorrect text handling when using
text(deep=True)on a text node.
Released
- Fix return type of HTMLParser.tags
Released
- Improve text handling
- Add binary builds for Python 3.10 and ARM on MacOS and Linux
Released
- Add type annotations
Released
- Fix
HTMLParser.html
Released
- Use
documentfor theHTMLParser.html,LexborHTMLParser.htmlroot properties
Released
- Fix
selectormethod for lexbor - Improve text extraction for lexbor
Released
- Fix
setup.pyfor Windows
Released
- Added
lexborbackend - Fix cloning for
Modestbackend
Released
- Added advanced Selector (the
selectmethod) - Improved speed of
strip_tags - Added
clonemethod for theHtmlParserobject - Exposed
detect_encoding,decode_errors,use_meta_tags,raw_htmlattributes forHtmlParser - Added
sgetmethod to theattrsproperty
Released
- Don't throw exception when encoding text as UTF-8 bytes fails (#40).
- Fix Node.attrs.items() causes (#39).
Released
- Build wheels Apple Silicon
Released
- Fix strip argument is ignored for the root node (#35).
- Fix CSS parser hangs on a bad CSS selector (#36).
Released
- Fix root node property (#32). The
rootproperty now points to the html tag.
Released
- Fix README for PyPI
Released
- Add wheels for Python 3.9
Released
- Add
raw_valueattribute forNodeobjects (#22) - Improve node modification operations
Released
- Fix dependency on the source
Nodewhen inserting to or modifying destinationNode
Released
- Allow to pass Node instances to
replace_with,insert_beforeandinsert_aftermethods - Added
insert_beforeandinsert_aftermethods
Released
- Set maximum input size to 80MB
- Update modest
Released
- Rebuild PyPi wheels to support Python 3.8 and manylinux2010
Released
- Fix node comparison
Released
- Add optional
include_textparameter for theiterandtraversemethods
Released