Skip to content

Releases: EmilStenstrom/justhtml

Release v1.15.0

09 Apr 05:46

Choose a tag to compare

Security

  • (Severity: Low) Harden HTML comment serialization against comment-breakout injection. Previously, programmatic Comment(...) nodes or transform-produced comment data containing sequences like --> could serialize into active HTML such as injected <img onerror>.
  • (Severity: Low) Harden HTML serialization and the builder against unsafe programmatic element and attribute names. Previously, direct Node(...) usage, transform-produced attrs, or builder.element(...) calls could emit attacker-controlled markup such as injected <img onerror> by including syntax-breaking characters in a tag or attribute name.
  • (Severity: Moderate) Harden JustHTML.clean_url_value(...) and clean_url_in_js_string(...) against HTML character reference smuggling such as javascript&#58..., which could bypass URL scheme validation and become an active javascript: URL after HTML attribute parsing.
  • (Severity: Low) Harden URL sanitization against browser backslash normalization. Previously, “relative” URLs such as \\evil.example/x or /\\evil.example/x could survive sanitization and be interpreted by browsers as remote network requests, bypassing relative-only URL rules such as the default img[src] policy.
  • (Severity: Low) Harden URL sanitization and clean_url_value(...) against malformed bracketed hosts when allowed_hosts is enabled. Previously, inputs such as https://[evil.example]/x could raise ValueError from Python’s URL parser and crash sanitization instead of being rejected.
  • (Severity: Low) Harden to_markdown(html_passthrough=True) for sanitized <textarea> content. Previously, attacker-controlled </textarea> sequences could survive sanitization as text, then break out during Markdown HTML passthrough and turn into active HTML when the Markdown output was reparsed or rendered.
  • (Severity: Low) Harden a[ping] sanitization. Previously, ping was treated as a single URL even though browsers interpret it as a space-separated list of URLs, so a custom policy could allow a trusted first endpoint while unintentionally preserving additional attacker-controlled ping URLs.
  • (Severity: Low) Harden preserved <style> blocks in custom policies. Previously, JustHTML only neutralized HTML parser breakouts inside allowed <style> elements; resource-loading CSS such as @import, url(...), image-set(...), and legacy binding/filter constructs could still survive unchanged.
  • (Severity: Low) Harden preserved <meta http-equiv=\"refresh\"> tags in custom policies. Previously, the content attribute was treated as inert text even though browsers interpret it as a client-side redirect instruction, so refresh targets could survive without any URL policy.
  • (Severity: Low) Harden link[imagesrcset] sanitization in custom policies. Previously, imagesrcset was not treated as URL-bearing at all, so <link rel="preload" as="image"> could preserve attacker-controlled remote image candidates without any URL validation.
  • (Severity: Low) Harden attributionsrc sanitization in custom policies. Previously, attributionsrc was not treated as URL-bearing at all, so elements such as <img> could preserve attacker-controlled attribution-reporting endpoints and trigger extra browser requests without any URL validation.
  • (Severity: Low) Harden security-related attribute transforms against mixed-case attribute names in custom pipelines. Previously, transforms such as DropAttrs(...), DropUrlAttrs(...), AllowStyleAttrs(...), and MergeAttrs(...) could miss or mis-handle OnClick, SrcDoc, Href, Style, Rel, and similar mixed-case variants unless an earlier step had already normalized names to lowercase.
  • (Severity: Low) Harden preserved <base href> tags in custom policies. Previously, a kept <base href="..."> could rewrite how later relative URLs resolved in the browser, bypassing per-attribute relative-only URL rules such as img[src].

Release v1.14.0

05 Apr 10:44

Choose a tag to compare

Security

  • (Severity: Moderate) Harden constructor-time sanitization against mutation XSS in custom policies that preserve foreign namespaces such as MathML or SVG. Previously, crafted markup could sanitize into output that looked safe but became active HTML when reparsed by a browser or downstream parser.

Release v1.13.0

21 Mar 20:46

Choose a tag to compare

Security

  • (Severity: High): Harden fenced code generation in to_markdown() by choosing backtick delimiters longer than any run inside <pre> content, preventing attacker-controlled backticks from breaking out of code blocks and exposing raw HTML to downstream Markdown renderers.
  • (Severity: Low): Treat text that starts at the beginning of a rendered Markdown line as text, not block syntax, by escaping line-leading headings, blockquotes, list markers, thematic breaks, setext underlines, and fenced-code delimiters from untrusted HTML content.

Release v1.12.0

17 Mar 21:58

Choose a tag to compare

Security

  • (Severity: High) Markdown output now HTML-escapes text-node content before applying Markdown escaping, preventing attacker-controlled text such as <script> from turning into raw HTML when to_markdown() output is rendered.
  • (Severity: Moderate) Sanitization now hardens script and style raw-text content by neutralizing embedded closing-tag sequences and dropping non-text children, preventing sanitized DOM trees from serializing into breakout HTML.

Release v1.11.0

15 Mar 22:04

Choose a tag to compare

Added

  • Sanitization: Add SanitizationPolicy.strip_invisible_unicode to strip invisible Unicode used for obfuscation from text and attribute values before other sanitizer checks run.

Changed

  • Sanitization: strip_invisible_unicode is enabled by default and covers variation selectors, zero-width/bidi controls, and private-use characters.

Security

  • (Severity: Low) Harden sanitization against invisible-Unicode obfuscation in text, attributes, and URL-like values such as disguised javascript: schemes.

Release v1.10.0

15 Mar 14:59

Choose a tag to compare

Security

  • (Severity: Low) Harden JustHTML against denial-of-service from attacker-controlled deeply nested HTML. Parsing post-processing, deep cloning, pretty HTML serialization, and Markdown rendering now use iterative traversal instead of recursion, preventing RecursionError crashes on pathological nesting.

Release v1.9.1

10 Mar 20:09

Choose a tag to compare

Fixed

  • Serialization: Preserve literal text inside script and style elements during HTML serialization so round-trips do not turn raw text content like > or & into entity text.

Release v1.9.0

08 Mar 22:46

Choose a tag to compare

Added

  • Builder: Add justhtml.builder with explicit element(), text(), comment(), and doctype() factories for programmatic HTML construction.
  • Parser: Allow JustHTML(...) to accept built nodes directly and normalize them through the existing HTML5 parser.
  • Docs: Add a dedicated Building HTML guide and expand the API/README documentation around programmatic HTML generation.

Changed

  • Sanitization: Preserve doctypes by default in document mode.
  • Sanitization: Add <caption> to the default allowed tag set.
  • Typing: Normalize SanitizationPolicy.allowed_tags to frozenset[str], improving type safety when composing policies.

Fixed

  • Builder & Serialization: Preserve arbitrary doctype names and identifiers across build/serialize/parse round-trips.
  • Builder: Reject unsupported namespaces up front; builder namespaces are limited to HTML, SVG, and MathML.

Release v1.8.0

05 Mar 17:08

Choose a tag to compare

Added

  • CLI: Add --strict flag to fail with exit code 2 and print an error message on any parse error.

Release v1.7.0

08 Feb 20:42

Choose a tag to compare

Added

  • Selectors: Add query_one() on JustHTML and Node for retrieving the first match (or None).

Fixed

  • Packaging: Include py.typed in wheels for PEP 561 type hinting support.

Changed

  • Performance: ~9% faster JustHTML(...).to_html(pretty=False) than 1.6.0 on the web100k justhtml_to_html benchmark (200 files x 3 iterations): 7.244s -> 6.571s (median).
  • Performance: Multiple internal speedups in serializer, tokenizer, tree builder, and transforms for lower per-document overhead.

Docs

  • Expand API and selector documentation (including performance notes).