Skip to content

Commit 7c32b7e

Browse files
committed
Canonicalize enforced attrs in Cleaner
Prevent duplicate enforced attributes in cleaned preserve-case documents by replacing case-variant source attrs during the enforced-attr merge. Fixes #2476
1 parent e640ca8 commit 7c32b7e

File tree

3 files changed

+29
-1
lines changed

3 files changed

+29
-1
lines changed

CHANGES.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
* In `NodeTraversor`, removing or replacing the current node during `head()` no longer re-visits the replacement node, preventing loops. Traversal now continues correctly from nodes that occupy the original position after mutation, and will not advance past the original root subtree. Also, clarified in the documentation which inserted nodes are visited during the current traversal. [#2472](https://github.com/jhy/jsoup/issues/2472)
88
* Parsing during charset sniffing no longer fails if an advisory `available()` call throws `IOException`, as seen on JDK 8 `HttpURLConnection`. [#2474](https://github.com/jhy/jsoup/issues/2474)
99
* `Cleaner` no longer makes relative URL attributes in the input document absolute when cleaning or validating a `Document`. URL normalization now applies only to the cleaned output, and `Safelist.isSafeAttribute()` is side effect free. [#2475](https://github.com/jhy/jsoup/issues/2475)
10+
* `Cleaner` no longer duplicates enforced attributes when the input `Document` preserves attribute case. A case-variant source attribute is now replaced by the enforced attribute in the cleaned output. [#2476](https://github.com/jhy/jsoup/issues/2476)
1011

1112
## 1.22.1 (2026-Jan-01)
1213

src/main/java/org/jsoup/safety/Cleaner.java

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -215,7 +215,11 @@ private ElementMeta createSafeElement(Element sourceEl) {
215215
}
216216
}
217217

218-
destAttrs.addAll(enforcedAttrs);
218+
// apply enforced attributes case-insensitively, so a preserved-case source attr is canonicalized to the enforced key
219+
for (Attribute enforcedAttr : enforcedAttrs) {
220+
destAttrs.removeIgnoreCase(enforcedAttr.getKey());
221+
destAttrs.put(enforcedAttr.getKey(), enforcedAttr.getValue());
222+
}
219223
dest.attributes().addAll(destAttrs); // re-attach, if removed in clear
220224
return new ElementMeta(dest, numDiscarded);
221225
}

src/test/java/org/jsoup/safety/CleanerTest.java

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -579,6 +579,29 @@ void cleansCaseSensitiveElements(boolean preserveCase) {
579579
assertEquals("<a href=\"http://external.com/\" rel=\"nofollow\">One</a> <a href=\"/relative/\">Two</a> <a href=\"../other/\">Three</a> <a href=\"http://example.com/bar\" rel=\"nofollow\">Four</a>", clean4);
580580
}
581581

582+
@Test void canonicalizesEnforcedAttributes() {
583+
Document customDirty = Jsoup.parse("<a REL='external'>One</a>", "",
584+
Parser.htmlParser().settings(ParseSettings.preserveCase));
585+
Cleaner customCleaner = new Cleaner(Safelist.none()
586+
.addTags("a")
587+
.addEnforcedAttribute("a", "rel", "external"));
588+
assertEquals("<a rel=\"external\">One</a>", customCleaner.clean(customDirty).body().html());
589+
}
590+
591+
@Test void canonicalizesNofollowEnforcedAttribute() {
592+
Document dirty = Jsoup.parse("<a href='http://external.com/' REL='nofollow'>One</a>", "",
593+
Parser.htmlParser().settings(ParseSettings.preserveCase));
594+
Cleaner cleaner = new Cleaner(Safelist.basic());
595+
assertEquals("<a href=\"http://external.com/\" rel=\"nofollow\">One</a>", cleaner.clean(dirty).body().html());
596+
}
597+
598+
@Test void preservesMatchingSourceNofollowWhenEnforcementSuppressed() {
599+
Document dirty = Jsoup.parse("<a href='http://example.com/foo' REL='nofollow'>One</a>", "http://example.com/",
600+
Parser.htmlParser().settings(ParseSettings.preserveCase));
601+
Cleaner cleaner = new Cleaner(Safelist.basic());
602+
assertEquals("<a href=\"http://example.com/foo\" REL=\"nofollow\">One</a>", cleaner.clean(dirty).body().html());
603+
}
604+
582605
@Test void discardsSvgScriptData() {
583606
// https://github.com/jhy/jsoup/issues/2320
584607
Safelist svgOk = Safelist.none().addTags("svg");

0 commit comments

Comments
 (0)