-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
[singlehtml] add docname to section anchor to make them unique #13739
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
e6b65fb
to
5117057
Compare
Hi @gastmaier - I'm a former semi-regular volunteer contributor here, although I have been less active recently. Thanks for the pull request; and sorry that I did not notice the toctree constructor problem, as you mention in #13717. I am reading both #13717 and this PR #13739 to try to understand the different approaches and reasons for them. Also: do you have a test case that we could add under |
Hi @jayaddison maybe extending tests/test_builders/test_build_html_tocdepth.py |
@gastmaier that sounds perfect, yep! (I'd forgotten about those tests) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Align with purpose
c50ba56
to
910de47
Compare
Hi, @jayaddison and @akhilsmokie7-cloud I rebased and added the test to check for duplicated ids. I added the test to the bottom, checking out fe728f4 will fail at
as expected, since at f5457f1 On, "the html build also changes the images src path during the write step", this is what I am talking about CI note:
is due to 2e51b787680cefdfe56b3438d809e6476600a47e Thanks, |
910de47
to
3a92a34
Compare
sphinx/builders/singlehtml.py
Outdated
@@ -110,7 +110,7 @@ def assemble_toc_secnumbers(self) -> dict[str, dict[str, tuple[int, ...]]]: | |||
new_secnumbers: dict[str, tuple[int, ...]] = {} | |||
for docname, secnums in self.env.toc_secnumbers.items(): | |||
for id, secnum in secnums.items(): | |||
alias = f'{docname}/{id}' | |||
alias = f'{docname}{id}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What kind of values are possible for the docname
and id
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(also: I guess people shouldn't have written hyperlinks or saved bookmarks with the assumption that these aliases are stable? but, even so - if we change the format, I guess we would break those?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gastmaier in fact: I'm not sure where these /
separator characters appear. What does this code relate to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For singlehtml and at the assemble toctree step, the href is a tuple of docname and refid.
#document-path/to/#id1
to try to avoid the refid confliction in singlehtml mode problem, which didn't work because it would patch toctree, but the content body still had the non-unique ids.
My pr changes the toctree href format from
#document-path/to/#id1
to #document-path/to#id1
(removes end slash)
and for content ids from
#d1
to #document-path/to#id1
(adds doc prefix to make unique)
the new template is therefore:
#document-{doc}#{id}
direct tuple of docname and refid, without the slash.
These are valid HTML anchors, but do require escaping when manipulating with:
css
#document-test\/extra\#test {color: #f00;}
and javascript
document.querySelector('#document-test\\/extra#test')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
singlehtml.zip
here is a singlehtml build with the patch
sphinx/writers/html5.py
Outdated
@@ -497,6 +498,15 @@ def depart_term(self, node: Element) -> None: | |||
|
|||
self.body.append('</dt>') | |||
|
|||
def visit_section(self, node: section) -> None: | |||
if self.builder.name == 'singlehtml' and node['ids']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't seem to use many @property
methods in the Sphinx writers, but maybe this singlehtml
condition is getting to the point where it makes sense (this is the third potential callsite, I think?).
Maybe pedantic of me to mention, but: running the test code without the fix in place does confirm that the test case fails (duplication of |
(I attempted that to reassure myself and to learn slightly more about how the fix works) |
Drafting again, I spotted more links using the non-doc-prefixed anchor in the body.
are not being prefixed. but their links to it are correct ( I will give yet another try, but this time transversing the pickled to patch all ids early on, instead of patching at the nodes visit. Sample of new new approach: |
To assert unique ids in singlehtml builder. Signed-off-by: Jorge Marques <[email protected]>
Since the singlehtml aggregates all doc files into a single html page during the write step, and the ids must be unique for proper link anchoring, add test that collects all ids in the page and checks if all ids are unique, by asserting the length of the list against it as a set.
3a92a34
to
9edcc87
Compare
9edcc87
to
82fae9f
Compare
Applied the ruthless traverse to patch all (ref)?ids early on, instead of patching at the nodes visit. This approach avoids mass overwrite of every docutils method under the sun, e.g. the starttag method for the sneaky explicit ref The procedure is to patch doctree ( Since the call stack is a little hidden, here is a summary
|
Use doc path to make ids unique. Compensates for the loss of the pathname in the href.
Format as document-<docname>#<id> to match other parts.
82fae9f
to
bfa9f06
Compare
if 'refid' in node or 'ids' in node: | ||
docname = env.path2doc(doc['source']) | ||
if 'refid' in node: | ||
node['refid'] = 'document-' + docname + '#' + node['refid'] | ||
if 'ids' in node: | ||
node['ids'] = ['document-' + docname + '#' + id for id in node['ids']] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll plan to do this within the next 24h or so, but I'll ask in case it is something you could do quickly: could you print out two columns of text with the before
and after
values for these node attributes when building a non-trivial project (easiest/safest choice: Sphinx itself)?
e.g.
refids
before after
[sample/#foo] [document-sample#foo]
node_id
sample/#foo document-sample#foo
The reason I ask: I'd like to inspect the places where the results differ, and in particular how the code changes achieve uniqueness of the results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I'm also wondering whether docutils
-- which produces the node
objects, if I understand correctly - could help us and allow us to fix this in a more central location; and I hope that viewing the comparison columns may also help to understand whether that is realistic or whether this is some Sphinx-specific quirk)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I'm also wondering whether
docutils
-- which produces thenode
objects, if I understand correctly - could help us and allow us to fix this in a more central location; and I hope that viewing the comparison columns may also help to understand whether that is realistic or whether this is some Sphinx-specific quirk)
Nope, scratch that - I think that docutils
is unaware of the notion of docnames
, so whatever is going on here must, I think, be part of Sphinx itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So docutils provides the solved tree to the builder, with each doc being a document.
Sphinx guarantees the ids are unique per doc, the filesystem guarantees the docname is unique (you cannot have two identical paths)
But the builder singlehtml flattens all into the root doc index
, loosing the information of the docname, causing non-unique ids after flatting it.
This fix recovers the docname and patches into the id itself.
The sphinx documentation itself, attached below, has conflicts, there are many duplicated id1.
The table requested (attached because it is too long):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks very much @gastmaier - that makes the problem and fix nice and clear.
I'm reading the comparison file at the moment - in particular I'm interested to find whether any of the before
elements included a /
delimiter -- I haven't found any so far. If there are none, then that would completely resolve my concern about breaking any existing hyperlinks containing that character.
Do you have any thoughts about whether we should always include the complete document path prefix? Or whether, for example, it could be omitted for unambiguous/unique IDs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would patch all at the moment, it makes sense to me to store the lost docname information in the id itself, and it is clearer to debug.
For the toctree, before the pr, it would already generate in the format document-<doc>#<id>
, so this would need to be assessed as well. That's what #13717 tried to fix, only to uncover the collision issue.
And there are so many visit_*
elements that needs to be patched to handle every corner case, that uniforming into a single format early on (after SphinxPostTransform, before other singlehtml patches) seems to be the only reliable approach.
The latex builder does patch at the visit_*
elements with the sphinx/writers/latex.py#hypertarget[withdoc=True]
method, but I don't see that working with html since it is straight up more convoluted since each visit would require some if builder.name is 'singlehtml'
.
Purpose
Follow up to #13717, inverting the logic, instead of patching the toctree to yield "#id1" instead of "#document-path/to#id1", have the section id to be docname preffixed, solving non-unique ids in singlehtml.
Allows to remove post Sphinx transforms like in here
Top level overview of current behavior
Approach taken
Based on the LaTeX builder solution.
sphinx/writers/latex.py#hypertarget[withdoc=True]
method suffixes docutils id with the docname.In my implementation I edit ids['0'] directly to not have to overwrite the whole
visit_section
method, but I understand if requested to not modify the tree and instead overwrite.On the format #document-test/extra#id1
It is compatible with HTML anchoring, CSS and JavaScript selectors, but require escaping:
Tests
The following tests are relevant:
References