[singlehtml] add docname to section anchor to make them unique #13739

gastmaier · 2025-07-20T14:10:55Z

Purpose

Follow up to #13717, inverting the logic, instead of patching the toctree to yield "#id1" instead of "#document-path/to#id1", have the section id to be docname preffixed, solving non-unique ids in singlehtml.
Allows to remove post Sphinx transforms like in here

Top level overview of current behavior

ID collision is resolved per doc (#already-used -> #id1, #already-used -> #id2).
There is no ID collision resolution on singlehtml step.

Approach taken

Based on the LaTeX builder solution.
sphinx/writers/latex.py#hypertarget[withdoc=True] method suffixes docutils id with the docname.
In my implementation I edit ids['0'] directly to not have to overwrite the whole visit_section method, but I understand if requested to not modify the tree and instead overwrite.

On the format #document-test/extra#id1

It is compatible with HTML anchoring, CSS and JavaScript selectors, but require escaping:

#document-test\/extra\#test {color: #f00;}

document.querySelector('#document-test\\/extra\\#test')

Tests

The following tests are relevant:

tests/test_builders/test_build_html_tocdepth.py
test_build_html_numfig.py

References

jayaddison · 2025-08-02T20:34:29Z

Hi @gastmaier - I'm a former semi-regular volunteer contributor here, although I have been less active recently. Thanks for the pull request; and sorry that I did not notice the toctree constructor problem, as you mention in #13717.

I am reading both #13717 and this PR #13739 to try to understand the different approaches and reasons for them.

Also: do you have a test case that we could add under tests/roots that demonstrates the problem? I suppose it would need to include a table of contents of some kind and have a corresponding singlehtml test case.

gastmaier · 2025-08-03T16:53:29Z

Hi @jayaddison maybe extending tests/test_builders/test_build_html_tocdepth.py
to check for duplicated ids?
as is, it already checks if ids are as expected (e.g., the pr changes things in 3 location to keep passing the test)., but not for duplicated ids, so I guess I can add that assertion

jayaddison · 2025-08-03T20:07:05Z

@gastmaier that sounds perfect, yep! (I'd forgotten about those tests)

akhilsmokie7-cloud

Align with purpose

gastmaier · 2025-08-30T17:34:38Z

Hi, @jayaddison and @akhilsmokie7-cloud I rebased and added the test to check for duplicated ids.
This pr relies on changing the ids during the write step.
Initially I didn't really like the approach, but I recently stumble on the fact that the html build also changes the images src path during the write step (original/path/to/image.png -> _images/image_<counter>.png), so I am now more comfortable with this approach.

I added the test to the bottom, checking out fe728f4 will fail at

FAILED tests/test_builders/test_build_html_tocdepth.py::test_unique_ids_singlehtml - AssertionError: assert 16 == 15

as expected, since at f5457f1
I purposely added a section called FooBar to both foo bar, forcing the same id in both pages, which is a problem only for single output.

On, "the html build also changes the images src path during the write step", this is what I am talking about
https://github.com/sphinx-doc/sphinx/blob/master/sphinx/writers/html5.py#L754-L755

CI note:
Failing test

FAILED tests/test_directives/test_directive_only.py::test_sectioning - AssertionError: Section out of place: '1.6.2. Subsection'
assert '1.6.1.1.' == '1.6.2.'

is due to 2e51b787680cefdfe56b3438d809e6476600a47e

Thanks,

jayaddison · 2025-09-02T09:34:21Z

sphinx/builders/singlehtml.py

@@ -110,7 +110,7 @@ def assemble_toc_secnumbers(self) -> dict[str, dict[str, tuple[int, ...]]]:
        new_secnumbers: dict[str, tuple[int, ...]] = {}
        for docname, secnums in self.env.toc_secnumbers.items():
            for id, secnum in secnums.items():
-                alias = f'{docname}/{id}'
+                alias = f'{docname}{id}'


What kind of values are possible for the docname and id?

(also: I guess people shouldn't have written hyperlinks or saved bookmarks with the assumption that these aliases are stable? but, even so - if we change the format, I guess we would break those?)

@gastmaier in fact: I'm not sure where these / separator characters appear. What does this code relate to?

For singlehtml and at the assemble toctree step, the href is a tuple of docname and refid.
#document-path/to/#id1 to try to avoid the refid confliction in singlehtml mode problem, which didn't work because it would patch toctree, but the content body still had the non-unique ids.

My pr changes the toctree href format from
#document-path/to/#id1 to #document-path/to#id1 (removes end slash)
and for content ids from
#d1 to #document-path/to#id1 (adds doc prefix to make unique)
the new template is therefore:
#document-{doc}#{id}
direct tuple of docname and refid, without the slash.

These are valid HTML anchors, but do require escaping when manipulating with:
css

#document-test\/extra\#test {color: #f00;}

and javascript

document.querySelector('#document-test\\/extra#test')

singlehtml.zip
here is a singlehtml build with the patch

jayaddison · 2025-09-02T09:37:10Z

sphinx/writers/html5.py

@@ -497,6 +498,15 @@ def depart_term(self, node: Element) -> None:

            self.body.append('</dt>')

+    def visit_section(self, node: section) -> None:
+        if self.builder.name == 'singlehtml' and node['ids']:


We don't seem to use many @property methods in the Sphinx writers, but maybe this singlehtml condition is getting to the point where it makes sense (this is the third potential callsite, I think?).

jayaddison · 2025-09-02T09:51:51Z

Maybe pedantic of me to mention, but: running the test code without the fix in place does confirm that the test case fails (duplication of foobar-b1 alias).

jayaddison · 2025-09-02T09:53:34Z

Maybe pedantic of me to mention, but: running the test code without the fix in place does confirm that the test case fails (duplication of foobar-b1 alias).

(I attempted that to reassure myself and to learn slightly more about how the fix works)

gastmaier · 2025-09-02T10:32:44Z

Drafting again, I spotted more links using the non-doc-prefixed anchor in the body.
I spotted: explicit refs

.. _explicit-ref:

are not being prefixed. but their links to it are correct (document-path/to#explicit-ref)

I will give yet another try, but this time transversing the pickled to patch all ids early on, instead of patching at the nodes visit.

Sample of new new approach:
doc.tar.gz

To assert unique ids in singlehtml builder. Signed-off-by: Jorge Marques <[email protected]>

Since the singlehtml aggregates all doc files into a single html page during the write step, and the ids must be unique for proper link anchoring, add test that collects all ids in the page and checks if all ids are unique, by asserting the length of the list against it as a set.

gastmaier · 2025-09-03T08:42:58Z

Applied the ruthless traverse to patch all (ref)?ids early on, instead of patching at the nodes visit.

This approach avoids mass overwrite of every docutils method under the sun, e.g. the starttag method for the sneaky explicit ref <span id="<id>">.

The procedure is to patch doctree (prefix_ids_with_docname) after the assemble_toctree , and before the other singlehtml patches (assemble_toc_secnumbers and assemble_toc_fignumbers), that also have been adjusted to match the existing document-<doc>#<id> format instead the previous loose <doc>/<id> format.

Since the call stack is a little hidden, here is a summary

@builders/singlehtml
write_documents
  - assemble_doctree:
    - inline_all_toctrees
    - resolve_references
      -  apply_post_transforms
    - prefix_ids_with_docname (new)
  - assemble_toc_secnumbers
  - assemble_toc_fignumbers

Use doc path to make ids unique. Compensates for the loss of the pathname in the href.

Format as document-<docname>#<id> to match other parts.

jayaddison · 2025-09-03T09:32:42Z

sphinx/builders/singlehtml.py

+            if 'refid' in node or 'ids' in node:
+                docname = env.path2doc(doc['source'])
+            if 'refid' in node:
+                node['refid'] = 'document-' + docname + '#' + node['refid']
+            if 'ids' in node:
+                node['ids'] = ['document-' + docname + '#' + id for id in node['ids']]


I'll plan to do this within the next 24h or so, but I'll ask in case it is something you could do quickly: could you print out two columns of text with the before and after values for these node attributes when building a non-trivial project (easiest/safest choice: Sphinx itself)?

e.g.

refids before after [sample/#foo] [document-sample#foo] node_id sample/#foo document-sample#foo

The reason I ask: I'd like to inspect the places where the results differ, and in particular how the code changes achieve uniqueness of the results.

(I'm also wondering whether docutils -- which produces the node objects, if I understand correctly - could help us and allow us to fix this in a more central location; and I hope that viewing the comparison columns may also help to understand whether that is realistic or whether this is some Sphinx-specific quirk)

(I'm also wondering whether docutils -- which produces the node objects, if I understand correctly - could help us and allow us to fix this in a more central location; and I hope that viewing the comparison columns may also help to understand whether that is realistic or whether this is some Sphinx-specific quirk)

Nope, scratch that - I think that docutils is unaware of the notion of docnames, so whatever is going on here must, I think, be part of Sphinx itself.

So docutils provides the solved tree to the builder, with each doc being a document.
Sphinx guarantees the ids are unique per doc, the filesystem guarantees the docname is unique (you cannot have two identical paths)
But the builder singlehtml flattens all into the root doc index, loosing the information of the docname, causing non-unique ids after flatting it.
This fix recovers the docname and patches into the id itself.

The sphinx documentation itself, attached below, has conflicts, there are many duplicated id1.

singlehtml.zip

The table requested (attached because it is too long):

ids.md

Thanks very much @gastmaier - that makes the problem and fix nice and clear.

I'm reading the comparison file at the moment - in particular I'm interested to find whether any of the before elements included a / delimiter -- I haven't found any so far. If there are none, then that would completely resolve my concern about breaking any existing hyperlinks containing that character.

Do you have any thoughts about whether we should always include the complete document path prefix? Or whether, for example, it could be omitted for unambiguous/unique IDs?

I would patch all at the moment, it makes sense to me to store the lost docname information in the id itself, and it is clearer to debug.

For the toctree, before the pr, it would already generate in the format document-<doc>#<id>, so this would need to be assessed as well. That's what #13717 tried to fix, only to uncover the collision issue.

And there are so many visit_* elements that needs to be patched to handle every corner case, that uniforming into a single format early on (after SphinxPostTransform, before other singlehtml patches) seems to be the only reliable approach.

The latex builder does patch at the visit_* elements with the sphinx/writers/latex.py#hypertarget[withdoc=True] method, but I don't see that working with html since it is straight up more convoluted since each visit would require some if builder.name is 'singlehtml'.

gastmaier force-pushed the toctree-singlehtml2 branch 3 times, most recently from e6b65fb to 5117057 Compare July 20, 2025 14:23

gastmaier marked this pull request as ready for review July 21, 2025 07:39

AA-Turner added the sprint For work completed at a conference or similar event. label Jul 21, 2025

jayaddison mentioned this pull request Jul 27, 2025

[singlehtml] toctree no filename with anchor #13717

Closed

akhilsmokie7-cloud reviewed Aug 4, 2025

View reviewed changes

gastmaier force-pushed the toctree-singlehtml2 branch 2 times, most recently from c50ba56 to 910de47 Compare August 30, 2025 17:32

gastmaier force-pushed the toctree-singlehtml2 branch from 910de47 to 3a92a34 Compare September 2, 2025 08:56

jayaddison reviewed Sep 2, 2025

View reviewed changes

gastmaier marked this pull request as draft September 2, 2025 10:32

gastmaier added 2 commits September 2, 2025 23:56

Add section with same title for test-tocdepth/[foo/bar]

22e02cd

To assert unique ids in singlehtml builder. Signed-off-by: Jorge Marques <[email protected]>

gastmaier force-pushed the toctree-singlehtml2 branch from 3a92a34 to 9edcc87 Compare September 2, 2025 22:05

Update AUTHORS.rst and CHANGES.rst

18ca1b6

gastmaier force-pushed the toctree-singlehtml2 branch from 9edcc87 to 82fae9f Compare September 3, 2025 08:40

gastmaier added 2 commits September 3, 2025 11:15

[singlehtml]: Append docname to refid and ids

8741bb7

Use doc path to make ids unique. Compensates for the loss of the pathname in the href.

[singlehtml] Reformat fignum and secnum tuple

bfa9f06

Format as document-<docname>#<id> to match other parts.

gastmaier force-pushed the toctree-singlehtml2 branch from 82fae9f to bfa9f06 Compare September 3, 2025 09:16

gastmaier marked this pull request as ready for review September 3, 2025 09:16

jayaddison reviewed Sep 3, 2025

View reviewed changes

Uh oh!

[singlehtml] add docname to section anchor to make them unique #13739

Are you sure you want to change the base?

[singlehtml] add docname to section anchor to make them unique #13739

Conversation

gastmaier commented Jul 20, 2025

Purpose

Top level overview of current behavior

Approach taken

On the format #document-test/extra#id1

Tests

References

Uh oh!

jayaddison commented Aug 2, 2025

Uh oh!

gastmaier commented Aug 3, 2025

Uh oh!

jayaddison commented Aug 3, 2025

Uh oh!

akhilsmokie7-cloud left a comment

Choose a reason for hiding this comment

Uh oh!

gastmaier commented Aug 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jayaddison commented Sep 2, 2025

Uh oh!

jayaddison commented Sep 2, 2025

Uh oh!

gastmaier commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gastmaier commented Sep 3, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gastmaier Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gastmaier commented Aug 30, 2025 •

edited

Loading

gastmaier commented Sep 2, 2025 •

edited

Loading

gastmaier Sep 3, 2025 •

edited

Loading