Releases: EmilStenstrom/justhtml
Releases · EmilStenstrom/justhtml
Release v1.15.0
Security
- (Severity: Low) Harden HTML comment serialization against comment-breakout injection. Previously, programmatic
Comment(...)nodes or transform-produced comment data containing sequences like-->could serialize into active HTML such as injected<img onerror>. - (Severity: Low) Harden HTML serialization and the builder against unsafe programmatic element and attribute names. Previously, direct
Node(...)usage, transform-produced attrs, orbuilder.element(...)calls could emit attacker-controlled markup such as injected<img onerror>by including syntax-breaking characters in a tag or attribute name. - (Severity: Moderate) Harden
JustHTML.clean_url_value(...)andclean_url_in_js_string(...)against HTML character reference smuggling such asjavascript:..., which could bypass URL scheme validation and become an activejavascript:URL after HTML attribute parsing. - (Severity: Low) Harden URL sanitization against browser backslash normalization. Previously, “relative” URLs such as
\\evil.example/xor/\\evil.example/xcould survive sanitization and be interpreted by browsers as remote network requests, bypassing relative-only URL rules such as the defaultimg[src]policy. - (Severity: Low) Harden URL sanitization and
clean_url_value(...)against malformed bracketed hosts whenallowed_hostsis enabled. Previously, inputs such ashttps://[evil.example]/xcould raiseValueErrorfrom Python’s URL parser and crash sanitization instead of being rejected. - (Severity: Low) Harden
to_markdown(html_passthrough=True)for sanitized<textarea>content. Previously, attacker-controlled</textarea>sequences could survive sanitization as text, then break out during Markdown HTML passthrough and turn into active HTML when the Markdown output was reparsed or rendered. - (Severity: Low) Harden
a[ping]sanitization. Previously,pingwas treated as a single URL even though browsers interpret it as a space-separated list of URLs, so a custom policy could allow a trusted first endpoint while unintentionally preserving additional attacker-controlled ping URLs. - (Severity: Low) Harden preserved
<style>blocks in custom policies. Previously, JustHTML only neutralized HTML parser breakouts inside allowed<style>elements; resource-loading CSS such as@import,url(...),image-set(...), and legacy binding/filter constructs could still survive unchanged. - (Severity: Low) Harden preserved
<meta http-equiv=\"refresh\">tags in custom policies. Previously, thecontentattribute was treated as inert text even though browsers interpret it as a client-side redirect instruction, so refresh targets could survive without any URL policy. - (Severity: Low) Harden
link[imagesrcset]sanitization in custom policies. Previously,imagesrcsetwas not treated as URL-bearing at all, so<link rel="preload" as="image">could preserve attacker-controlled remote image candidates without any URL validation. - (Severity: Low) Harden
attributionsrcsanitization in custom policies. Previously,attributionsrcwas not treated as URL-bearing at all, so elements such as<img>could preserve attacker-controlled attribution-reporting endpoints and trigger extra browser requests without any URL validation. - (Severity: Low) Harden security-related attribute transforms against mixed-case attribute names in custom pipelines. Previously, transforms such as
DropAttrs(...),DropUrlAttrs(...),AllowStyleAttrs(...), andMergeAttrs(...)could miss or mis-handleOnClick,SrcDoc,Href,Style,Rel, and similar mixed-case variants unless an earlier step had already normalized names to lowercase. - (Severity: Low) Harden preserved
<base href>tags in custom policies. Previously, a kept<base href="...">could rewrite how later relative URLs resolved in the browser, bypassing per-attribute relative-only URL rules such asimg[src].
Release v1.14.0
Security
- (Severity: Moderate) Harden constructor-time sanitization against mutation XSS in custom policies that preserve foreign namespaces such as MathML or SVG. Previously, crafted markup could sanitize into output that looked safe but became active HTML when reparsed by a browser or downstream parser.
Release v1.13.0
Security
- (Severity: High): Harden fenced code generation in
to_markdown()by choosing backtick delimiters longer than any run inside<pre>content, preventing attacker-controlled backticks from breaking out of code blocks and exposing raw HTML to downstream Markdown renderers. - (Severity: Low): Treat text that starts at the beginning of a rendered Markdown line as text, not block syntax, by escaping line-leading headings, blockquotes, list markers, thematic breaks, setext underlines, and fenced-code delimiters from untrusted HTML content.
Release v1.12.0
Security
- (Severity: High) Markdown output now HTML-escapes text-node content before applying Markdown escaping, preventing attacker-controlled text such as
<script>from turning into raw HTML whento_markdown()output is rendered. - (Severity: Moderate) Sanitization now hardens
scriptandstyleraw-text content by neutralizing embedded closing-tag sequences and dropping non-text children, preventing sanitized DOM trees from serializing into breakout HTML.
Release v1.11.0
Added
- Sanitization: Add
SanitizationPolicy.strip_invisible_unicodeto strip invisible Unicode used for obfuscation from text and attribute values before other sanitizer checks run.
Changed
- Sanitization:
strip_invisible_unicodeis enabled by default and covers variation selectors, zero-width/bidi controls, and private-use characters.
Security
- (Severity: Low) Harden sanitization against invisible-Unicode obfuscation in text, attributes, and URL-like values such as disguised
javascript:schemes.
Release v1.10.0
Security
- (Severity: Low) Harden JustHTML against denial-of-service from attacker-controlled deeply nested HTML. Parsing post-processing, deep cloning, pretty HTML serialization, and Markdown rendering now use iterative traversal instead of recursion, preventing
RecursionErrorcrashes on pathological nesting.
Release v1.9.1
Fixed
- Serialization: Preserve literal text inside
scriptandstyleelements during HTML serialization so round-trips do not turn raw text content like>or&into entity text.
Release v1.9.0
Added
- Builder: Add
justhtml.builderwith explicitelement(),text(),comment(), anddoctype()factories for programmatic HTML construction. - Parser: Allow
JustHTML(...)to accept built nodes directly and normalize them through the existing HTML5 parser. - Docs: Add a dedicated Building HTML guide and expand the API/README documentation around programmatic HTML generation.
Changed
- Sanitization: Preserve doctypes by default in document mode.
- Sanitization: Add
<caption>to the default allowed tag set. - Typing: Normalize
SanitizationPolicy.allowed_tagstofrozenset[str], improving type safety when composing policies.
Fixed
- Builder & Serialization: Preserve arbitrary doctype names and identifiers across build/serialize/parse round-trips.
- Builder: Reject unsupported namespaces up front; builder namespaces are limited to HTML, SVG, and MathML.
Release v1.8.0
Added
- CLI: Add
--strictflag to fail with exit code 2 and print an error message on any parse error.
Release v1.7.0
Added
- Selectors: Add
query_one()onJustHTMLandNodefor retrieving the first match (orNone).
Fixed
- Packaging: Include
py.typedin wheels for PEP 561 type hinting support.
Changed
- Performance: ~9% faster
JustHTML(...).to_html(pretty=False)than 1.6.0 on theweb100kjusthtml_to_htmlbenchmark (200 files x 3 iterations): 7.244s -> 6.571s (median). - Performance: Multiple internal speedups in serializer, tokenizer, tree builder, and transforms for lower per-document overhead.
Docs
- Expand API and selector documentation (including performance notes).