Releases · EmilStenstrom/justhtml

09 Apr 05:46

EmilStenstrom

v1.15.0

c816190

Release v1.15.0 Latest

Latest

Security

(Severity: Low) Harden HTML comment serialization against comment-breakout injection. Previously, programmatic Comment(...) nodes or transform-produced comment data containing sequences like --> could serialize into active HTML such as injected <img onerror>.
(Severity: Low) Harden HTML serialization and the builder against unsafe programmatic element and attribute names. Previously, direct Node(...) usage, transform-produced attrs, or builder.element(...) calls could emit attacker-controlled markup such as injected <img onerror> by including syntax-breaking characters in a tag or attribute name.
(Severity: Moderate) Harden JustHTML.clean_url_value(...) and clean_url_in_js_string(...) against HTML character reference smuggling such as javascript&#58..., which could bypass URL scheme validation and become an active javascript: URL after HTML attribute parsing.
(Severity: Low) Harden URL sanitization against browser backslash normalization. Previously, “relative” URLs such as \\evil.example/x or /\\evil.example/x could survive sanitization and be interpreted by browsers as remote network requests, bypassing relative-only URL rules such as the default img[src] policy.
(Severity: Low) Harden URL sanitization and clean_url_value(...) against malformed bracketed hosts when allowed_hosts is enabled. Previously, inputs such as https://[evil.example]/x could raise ValueError from Python’s URL parser and crash sanitization instead of being rejected.
(Severity: Low) Harden to_markdown(html_passthrough=True) for sanitized <textarea> content. Previously, attacker-controlled </textarea> sequences could survive sanitization as text, then break out during Markdown HTML passthrough and turn into active HTML when the Markdown output was reparsed or rendered.
(Severity: Low) Harden a[ping] sanitization. Previously, ping was treated as a single URL even though browsers interpret it as a space-separated list of URLs, so a custom policy could allow a trusted first endpoint while unintentionally preserving additional attacker-controlled ping URLs.
(Severity: Low) Harden preserved <style> blocks in custom policies. Previously, JustHTML only neutralized HTML parser breakouts inside allowed <style> elements; resource-loading CSS such as @import, url(...), image-set(...), and legacy binding/filter constructs could still survive unchanged.
(Severity: Low) Harden preserved <meta http-equiv=\"refresh\"> tags in custom policies. Previously, the content attribute was treated as inert text even though browsers interpret it as a client-side redirect instruction, so refresh targets could survive without any URL policy.
(Severity: Low) Harden link[imagesrcset] sanitization in custom policies. Previously, imagesrcset was not treated as URL-bearing at all, so <link rel="preload" as="image"> could preserve attacker-controlled remote image candidates without any URL validation.
(Severity: Low) Harden attributionsrc sanitization in custom policies. Previously, attributionsrc was not treated as URL-bearing at all, so elements such as <img> could preserve attacker-controlled attribution-reporting endpoints and trigger extra browser requests without any URL validation.
(Severity: Low) Harden security-related attribute transforms against mixed-case attribute names in custom pipelines. Previously, transforms such as DropAttrs(...), DropUrlAttrs(...), AllowStyleAttrs(...), and MergeAttrs(...) could miss or mis-handle OnClick, SrcDoc, Href, Style, Rel, and similar mixed-case variants unless an earlier step had already normalized names to lowercase.
(Severity: Low) Harden preserved <base href> tags in custom policies. Previously, a kept <base href="..."> could rewrite how later relative URLs resolved in the browser, bypassing per-attribute relative-only URL rules such as img[src].

Assets 2

05 Apr 10:44

EmilStenstrom

v1.14.0

27f5642

Release v1.14.0

Security

(Severity: Moderate) Harden constructor-time sanitization against mutation XSS in custom policies that preserve foreign namespaces such as MathML or SVG. Previously, crafted markup could sanitize into output that looked safe but became active HTML when reparsed by a browser or downstream parser.

Assets 2

21 Mar 20:46

EmilStenstrom

v1.13.0

044fb58

Release v1.13.0

Security

(Severity: High): Harden fenced code generation in to_markdown() by choosing backtick delimiters longer than any run inside <pre> content, preventing attacker-controlled backticks from breaking out of code blocks and exposing raw HTML to downstream Markdown renderers.
(Severity: Low): Treat text that starts at the beginning of a rendered Markdown line as text, not block syntax, by escaping line-leading headings, blockquotes, list markers, thematic breaks, setext underlines, and fenced-code delimiters from untrusted HTML content.

Assets 2

17 Mar 21:58

EmilStenstrom

v1.12.0

c518565

Release v1.12.0

Security

(Severity: High) Markdown output now HTML-escapes text-node content before applying Markdown escaping, preventing attacker-controlled text such as <script> from turning into raw HTML when to_markdown() output is rendered.
(Severity: Moderate) Sanitization now hardens script and style raw-text content by neutralizing embedded closing-tag sequences and dropping non-text children, preventing sanitized DOM trees from serializing into breakout HTML.

Assets 2

15 Mar 22:04

EmilStenstrom

v1.11.0

1604105

Release v1.11.0

Added

Sanitization: Add SanitizationPolicy.strip_invisible_unicode to strip invisible Unicode used for obfuscation from text and attribute values before other sanitizer checks run.

Changed

Sanitization: strip_invisible_unicode is enabled by default and covers variation selectors, zero-width/bidi controls, and private-use characters.

Security

(Severity: Low) Harden sanitization against invisible-Unicode obfuscation in text, attributes, and URL-like values such as disguised javascript: schemes.

Assets 2

15 Mar 14:59

EmilStenstrom

v1.10.0

5095a05

Release v1.10.0

Security

(Severity: Low) Harden JustHTML against denial-of-service from attacker-controlled deeply nested HTML. Parsing post-processing, deep cloning, pretty HTML serialization, and Markdown rendering now use iterative traversal instead of recursion, preventing RecursionError crashes on pathological nesting.

Assets 2

10 Mar 20:09

EmilStenstrom

v1.9.1

a866b60

Release v1.9.1

Fixed

Serialization: Preserve literal text inside script and style elements during HTML serialization so round-trips do not turn raw text content like > or & into entity text.

Assets 2

08 Mar 22:46

EmilStenstrom

v1.9.0

66897e3

Release v1.9.0

Added

Builder: Add justhtml.builder with explicit element(), text(), comment(), and doctype() factories for programmatic HTML construction.
Parser: Allow JustHTML(...) to accept built nodes directly and normalize them through the existing HTML5 parser.
Docs: Add a dedicated Building HTML guide and expand the API/README documentation around programmatic HTML generation.

Changed

Sanitization: Preserve doctypes by default in document mode.
Sanitization: Add <caption> to the default allowed tag set.
Typing: Normalize SanitizationPolicy.allowed_tags to frozenset[str], improving type safety when composing policies.

Fixed

Builder & Serialization: Preserve arbitrary doctype names and identifiers across build/serialize/parse round-trips.
Builder: Reject unsupported namespaces up front; builder namespaces are limited to HTML, SVG, and MathML.

Assets 2

05 Mar 17:08

EmilStenstrom

v1.8.0

fd60b6e

Release v1.8.0

Added

CLI: Add --strict flag to fail with exit code 2 and print an error message on any parse error.

Assets 2

08 Feb 20:42

EmilStenstrom

v1.7.0

c330ff1

Release v1.7.0

Added

Selectors: Add query_one() on JustHTML and Node for retrieving the first match (or None).

Fixed

Packaging: Include py.typed in wheels for PEP 561 type hinting support.

Changed

Performance: ~9% faster JustHTML(...).to_html(pretty=False) than 1.6.0 on the web100k justhtml_to_html benchmark (200 files x 3 iterations): 7.244s -> 6.571s (median).
Performance: Multiple internal speedups in serializer, tokenizer, tree builder, and transforms for lower per-document overhead.

Docs

Expand API and selector documentation (including performance notes).

Assets 2

Releases: EmilStenstrom/justhtml

Release v1.15.0

Security

Uh oh!

Release v1.14.0

Security

Uh oh!

Release v1.13.0

Security

Uh oh!

Release v1.12.0

Security

Uh oh!

Release v1.11.0

Added

Changed

Security

Uh oh!

Release v1.10.0

Security

Uh oh!

Release v1.9.1

Fixed

Uh oh!

Release v1.9.0

Added

Changed

Fixed

Uh oh!

Release v1.8.0

Added

Uh oh!

Release v1.7.0

Added

Fixed

Changed

Docs

Uh oh!