Skip to content

Commit a96388d

Browse files
committed
Cover more NodeTraversor mutation cases and tighten recovery
I found some additional mutation cases around root detachment, unwrap(), and count-changing sibling edits. Tightened traversal recovery so it stays within the original root boundary, continues correctly from the former sibling position, and add regression tests with full branch coverage. Related to #2742
1 parent b6fabf8 commit a96388d

File tree

4 files changed

+201
-18
lines changed

4 files changed

+201
-18
lines changed

CHANGES.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44

55
### Bug Fixes
66
* Android (R8/ProGuard): added a rule to ignore the optional `re2j` dependency when not present. [#2459](https://github.com/jhy/jsoup/issues/2459)
7-
* In `NodeTraversor`, removing or replacing the current node during `head()` no longer re-visits the replacement node, preventing loops. Also clarified in documentation the which inserted nodes are visited during the current traversal. [#2472](https://github.com/jhy/jsoup/issues/2472)
7+
* In `NodeTraversor`, removing or replacing the current node during `head()` no longer re-visits the replacement node, preventing loops. Traversal now continues correctly from nodes that occupy the original position after mutation, and will not advance past the original root subtree. Also, clarified in the documentation which inserted nodes are visited during the current traversal. [#2472](https://github.com/jhy/jsoup/issues/2472)
88

99
## 1.22.1 (2026-Jan-01)
1010

src/main/java/org/jsoup/select/NodeTraversor.java

Lines changed: 24 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
order. The {@link NodeVisitor#head(Node, int)} and {@link NodeVisitor#tail(Node, int)} methods will be called for
1111
each node.
1212
<p>During the <code>head()</code> visit, DOM structural changes around the node currently being visited are
13-
supported, including e.g. {@link Node#replaceWith(Node)} and {@link Node#remove()}. See
13+
supported, including {@link Node#replaceWith(Node)} and {@link Node#remove()}. See
1414
{@link NodeVisitor#head(Node, int) head()} for the traversal contract after mutation. Other non-structural node
1515
changes are also supported.</p>
1616
<p>DOM structural changes to the current node are not supported during the <code>tail()</code> visit.</p>
@@ -31,35 +31,40 @@ public static void traverse(NodeVisitor visitor, Node root) {
3131
Validate.notNull(visitor);
3232
Validate.notNull(root);
3333
Node node = root;
34+
final Node rootNext = root.nextSibling(); // don't traverse siblings beyond the original root
3435
int depth = 0;
3536
byte state = VisitHead;
3637

37-
while (node != null) {
38+
while (true) {
3839
if (state == VisitHead) {
3940
// snapshot the current cursor position so we can recover if head() structurally changes it:
40-
Node parent = node.parentNode();
41-
Node nextSib = node.nextSibling();
42-
int sibIndex = parent != null ? node.siblingIndex() : 0;
43-
int childCount = parent != null ? parent.childNodeSize() : 0;
41+
Node parent = node.parentNode();
42+
Node next = node.nextSibling();
43+
int sibIndex = parent != null ? node.siblingIndex() : 0;
4444

4545
visitor.head(node, depth);
4646

4747
// any structural changes?
48-
if (parent != null && !node.hasParent()) { // node was removed from parent; try to recover by sibling index
49-
if (parent.childNodeSize() == childCount) { // current slot is still occupied
50-
node = parent.childNode(sibIndex);
51-
state = AfterHead; // continue from that slot without re-heading it
52-
} else if (nextSib != null) { // removed; resume from the original next
53-
node = nextSib;
54-
} else { // removed last child; tail the parent next
48+
if (parent != null && node.parentNode() != parent) { // removed / replaced / moved
49+
Node occupant = sibIndex < parent.childNodeSize() ? parent.childNode(sibIndex) : null;
50+
// ^^ the node now at this node's former position
51+
Node boundary = depth == 0 ? rootNext : next; // don't advance beyond this node when resuming
52+
if (occupant != null && occupant != boundary) {
53+
node = occupant;
54+
state = AfterHead; // continue from that slot without re-heading it
55+
} else if (depth == 0) { // root detached or replaced
56+
break;
57+
} else if (next != null && next.parentNode() == parent) {
58+
node = next; // old slot is empty or shifted to the original next, visit
59+
} else { // removed last child; tail the parent next
5560
node = parent;
5661
depth--;
5762
state = VisitTail;
5863
}
5964
} else {
6065
state = AfterHead;
6166
}
62-
continue; // next loop handles the updated node/state
67+
continue; // next loop handles the updated node/state
6368
}
6469

6570
if (state == AfterHead && node.childNodeSize() > 0) { // descend into current children
@@ -71,10 +76,12 @@ public static void traverse(NodeVisitor visitor, Node root) {
7176

7277
visitor.tail(node, depth);
7378

74-
if (node == root) break; // done
75-
7679
Node next = node.nextSibling();
77-
if (next != null) { // traverse siblings
80+
if (depth == 0) {
81+
if (next == null || next == rootNext) break; // done with the original root range
82+
node = next;
83+
state = VisitHead;
84+
} else if (next != null) { // traverse siblings
7885
node = next;
7986
state = VisitHead;
8087
} else { // no siblings left, ascend

src/main/java/org/jsoup/select/NodeVisitor.java

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,8 @@ <p>The node may be modified (for example via {@link Node#attr(String)}), removed
4040
<li>If the current node was detached and no node occupies its former sibling position, the current node is not
4141
passed to {@code tail()}, and traversal resumes at the node that originally followed it.</li>
4242
</ul>
43+
<p>Traversal never advances outside the original root subtree. If the traversal root is detached during
44+
{@code head()}, traversal stops at the original root boundary.</p>
4345
4446
@param node the node being visited.
4547
@param depth the depth of the node, relative to the root node. E.g., the root node has depth 0, and a child node

src/test/java/org/jsoup/select/TraversorTest.java

Lines changed: 174 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -372,6 +372,180 @@ void visitsChildrenInsertedInHead() {
372372
assertEquals("<div><p><span>child</span></p></div>", TextUtil.stripNewlines(doc.body().html()));
373373
}
374374

375+
@Test
376+
void removingTraversalRootInHeadDoesNotEscapeOriginalSubtree() {
377+
Document doc = Jsoup.parseBodyFragment("<div>a</div><p>b</p>");
378+
Element root = doc.expectFirst("div");
379+
StringBuilder headOrder = new StringBuilder();
380+
StringBuilder tailOrder = new StringBuilder();
381+
382+
NodeTraversor.traverse(new NodeVisitor() {
383+
@Override public void head(Node node, int depth) {
384+
trackSeen(node, headOrder);
385+
if (node == root)
386+
node.remove();
387+
}
388+
389+
@Override public void tail(Node node, int depth) {
390+
trackSeen(node, tailOrder);
391+
}
392+
}, root);
393+
394+
assertEquals("div;", headOrder.toString());
395+
assertEquals("", tailOrder.toString());
396+
assertEquals("<p>b</p>", TextUtil.stripNewlines(doc.body().html()));
397+
}
398+
399+
@Test
400+
void removingOnlyTraversalRootInHeadStopsWhenOriginalSlotIsEmpty() {
401+
Document doc = Jsoup.parseBodyFragment("<div>a</div>");
402+
Element root = doc.expectFirst("div");
403+
StringBuilder headOrder = new StringBuilder();
404+
StringBuilder tailOrder = new StringBuilder();
405+
406+
NodeTraversor.traverse(new NodeVisitor() {
407+
@Override public void head(Node node, int depth) {
408+
trackSeen(node, headOrder);
409+
if (node == root)
410+
node.remove();
411+
}
412+
413+
@Override public void tail(Node node, int depth) {
414+
trackSeen(node, tailOrder);
415+
}
416+
}, root);
417+
418+
assertEquals("div;", headOrder.toString());
419+
assertEquals("", tailOrder.toString());
420+
assertEquals("", TextUtil.stripNewlines(doc.body().html()));
421+
}
422+
423+
@Test
424+
void replacingTraversalRootInHeadStaysWithinReplacementSubtree() {
425+
Document doc = Jsoup.parseBodyFragment("<div><span>x</span></div><p>y</p>");
426+
Element root = doc.expectFirst("div");
427+
StringBuilder headOrder = new StringBuilder();
428+
StringBuilder tailOrder = new StringBuilder();
429+
430+
NodeTraversor.traverse(new NodeVisitor() {
431+
@Override public void head(Node node, int depth) {
432+
trackSeen(node, headOrder);
433+
if (node == root) {
434+
Element replacement = new Element("section").insertChildren(0, node.childNodes());
435+
node.replaceWith(replacement);
436+
}
437+
}
438+
439+
@Override public void tail(Node node, int depth) {
440+
trackSeen(node, tailOrder);
441+
}
442+
}, root);
443+
444+
assertEquals("div;span;x;", headOrder.toString());
445+
assertEquals("x;span;section;", tailOrder.toString());
446+
assertEquals("<section><span>x</span></section><p>y</p>", TextUtil.stripNewlines(doc.body().html()));
447+
}
448+
449+
@Test
450+
void unwrappingTraversalRootInHeadVisitsExposedTopLevelNodesUntilOriginalBoundary() {
451+
Document doc = Jsoup.parseBodyFragment("<div><b>x</b><i>y</i></div><p>z</p>");
452+
Element root = doc.expectFirst("div");
453+
StringBuilder headOrder = new StringBuilder();
454+
StringBuilder tailOrder = new StringBuilder();
455+
456+
NodeTraversor.traverse(new NodeVisitor() {
457+
@Override public void head(Node node, int depth) {
458+
trackSeen(node, headOrder);
459+
if (node == root)
460+
node.unwrap();
461+
}
462+
463+
@Override public void tail(Node node, int depth) {
464+
trackSeen(node, tailOrder);
465+
}
466+
}, root);
467+
468+
assertEquals("div;x;i;y;", headOrder.toString());
469+
assertEquals("x;b;y;i;", tailOrder.toString());
470+
assertEquals("<b>x</b><i>y</i><p>z</p>", TextUtil.stripNewlines(doc.body().html()));
471+
}
472+
473+
@Test
474+
void unwrapInHeadContinuesFromExposedChildren() {
475+
Document doc = Jsoup.parseBodyFragment("<div><span><b>x</b><i>y</i></span><u>z</u></div>");
476+
Element root = doc.expectFirst("div");
477+
StringBuilder headOrder = new StringBuilder();
478+
StringBuilder tailOrder = new StringBuilder();
479+
480+
NodeTraversor.traverse(new NodeVisitor() {
481+
@Override public void head(Node node, int depth) {
482+
trackSeen(node, headOrder);
483+
if (node instanceof Element && node.nameIs("span"))
484+
node.unwrap();
485+
}
486+
487+
@Override public void tail(Node node, int depth) {
488+
trackSeen(node, tailOrder);
489+
}
490+
}, root);
491+
492+
assertEquals("div;span;x;i;y;u;z;", headOrder.toString());
493+
assertEquals("x;b;y;i;z;u;div;", tailOrder.toString());
494+
assertEquals("<div><b>x</b><i>y</i><u>z</u></div>", TextUtil.stripNewlines(doc.body().html()));
495+
}
496+
497+
@Test
498+
void removingCurrentAndOriginalNextInHeadTailsParent() {
499+
Document doc = Jsoup.parseBodyFragment("<div>a<em>Two</em></div>");
500+
Element root = doc.expectFirst("div");
501+
StringBuilder headOrder = new StringBuilder();
502+
StringBuilder tailOrder = new StringBuilder();
503+
504+
NodeTraversor.traverse(new NodeVisitor() {
505+
@Override public void head(Node node, int depth) {
506+
trackSeen(node, headOrder);
507+
if (node instanceof TextNode && ((TextNode) node).text().equals("a")) {
508+
node.nextSibling().remove();
509+
node.remove();
510+
}
511+
}
512+
513+
@Override public void tail(Node node, int depth) {
514+
trackSeen(node, tailOrder);
515+
}
516+
}, root);
517+
518+
assertEquals("div;a;", headOrder.toString());
519+
assertEquals("div;", tailOrder.toString());
520+
assertEquals("<div></div>", TextUtil.stripNewlines(doc.body().html()));
521+
}
522+
523+
@Test
524+
void beforeAfterRemoveInHeadTailsCurrentSlotAndHeadsFutureSiblings() {
525+
Document doc = Jsoup.parse("<div><p>One</p>a<em>Two</em></div>");
526+
StringBuilder headOrder = new StringBuilder();
527+
StringBuilder tailOrder = new StringBuilder();
528+
529+
NodeTraversor.traverse(new NodeVisitor() {
530+
@Override public void head(Node node, int depth) {
531+
trackSeen(node, headOrder);
532+
if (node instanceof TextNode && ((TextNode) node).text().equals("a")) {
533+
node.before(new TextNode("b"));
534+
node.after(new TextNode("c"));
535+
node.remove();
536+
}
537+
}
538+
539+
@Override public void tail(Node node, int depth) {
540+
trackSeen(node, tailOrder);
541+
}
542+
}, doc);
543+
544+
assertEquals("#root;html;head;body;div;p;One;a;c;em;Two;", headOrder.toString());
545+
assertEquals("head;One;p;b;c;Two;em;div;body;html;#root;", tailOrder.toString());
546+
assertEquals("<div><p>One</p>bc<em>Two</em></div>", TextUtil.stripNewlines(doc.body().html()));
547+
}
548+
375549
@Test void elementFunctionalTraverse() {
376550
Document doc = Jsoup.parse("<div><p>1<p>2<p>3");
377551
Element body = doc.body();

0 commit comments

Comments
 (0)