-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Description
Describe the bug
As noted in #11961, we have some duplicate search results that appear in the frontend at the moment.
Some of this is due to lack of JavaScript de-duplication of results, but it looks to me like we also have duplication of entries in the alltitles part of the search index data that is built by the Python-side HTML/search-index builder code.
Correction 2024-02-08: it turns out that these entries are not in fact duplicates. By coincidence there is a table-of-contents (toctree) with a caption of Get started that emits an empty ID (because it has no name option configured). I think we can improve the situation when a name is configured - and #11966 is intended to do that.
How to Reproduce
After self-building the Sphinx documentation from e976059 or a nearby commit, serve the contents of the built documentation via python -m http.server -b 127.0.0.1 and then run a query for get started.
You should see two near-duplicate results at the top of the page; both link to the index.html page, one without an anchor, and one with the anchor/fragment #get-started.
Open the developer tools and inspect the contents of Search._index.alltitles["Get started"] in a browser JavaScript console.
Observe that the title heading Get started has two entries, both for the same file ID of 30. This seems to be a bug, and can contribute towards the presence of duplicate titles entries processed by the client.
Environment Information
Platform: linux; (Linux-6.6.13-amd64-x86_64-with-glibc2.37)
Python version: 3.11.7 (main, Dec 8 2023, 14:22:46) [GCC 13.2.0])
Python implementation: CPython
Sphinx version: 7.2.6
Docutils version: 0.19
Jinja2 version: 3.1.3
Pygments version: 2.17.2
Sphinx extensions
N/AAdditional context
Relates-to / discovered during investigation of #11961.

