Skip to content

Commit 91c5cd3

Browse files
jayaddisonwlachpicnixz
authored
Improve relevance scoring in HTML search results (#12441)
Co-authored-by: Will Lachance <[email protected]> Co-authored-by: Bénédikt Tran <[email protected]>
1 parent e7beb8b commit 91c5cd3

File tree

8 files changed

+132
-3
lines changed

8 files changed

+132
-3
lines changed

CHANGES.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -112,6 +112,10 @@ Bugs fixed
112112
* #12425: Use Docutils' SVG processing in the HTML builder
113113
and remove Sphinx's custom logic.
114114
Patch by Tunç Başar Köse.
115+
* #12391: Adjust scoring of matches during HTML search so that document main
116+
titles tend to rank higher than subsection titles. In addition, boost matches
117+
on the name of programming domain objects relative to title/subtitle matches.
118+
Patch by James Addison and Will Lachance.
115119

116120
Testing
117121
-------

sphinx/themes/basic/static/searchtools.js

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -328,13 +328,14 @@ const Search = {
328328
for (const [title, foundTitles] of Object.entries(allTitles)) {
329329
if (title.toLowerCase().trim().includes(queryLower) && (queryLower.length >= title.length/2)) {
330330
for (const [file, id] of foundTitles) {
331-
let score = Math.round(100 * queryLower.length / title.length)
331+
const score = Math.round(Scorer.title * queryLower.length / title.length);
332+
const boost = titles[file] === title ? 1 : 0; // add a boost for document titles
332333
normalResults.push([
333334
docNames[file],
334335
titles[file] !== title ? `${titles[file]} > ${title}` : title,
335336
id !== null ? "#" + id : "",
336337
null,
337-
score,
338+
score + boost,
338339
filenames[file],
339340
]);
340341
}

tests/js/fixtures/titles/searchindex.js

Lines changed: 1 addition & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

tests/js/roots/titles/conf.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
import os
2+
import sys
3+
4+
sys.path.insert(0, os.path.abspath('.'))
5+
6+
extensions = ['sphinx.ext.autodoc']

tests/js/roots/titles/index.rst

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
Main Page
2+
=========
3+
4+
This is the main page of the ``titles`` test project.
5+
6+
In particular, this test project is intended to demonstrate how Sphinx
7+
can handle scoring of query matches against document titles and subsection
8+
heading titles relative to other document matches such as terms found within
9+
document text and object names extracted from code.
10+
11+
Relevance
12+
---------
13+
14+
In the context of search engines, we can say that a document is **relevant**
15+
to a user's query when it contains information that seems likely to help them
16+
find an answer to a question they're asking, or to improve their knowledge of
17+
the subject area they're researching.
18+
19+
.. automodule:: relevance
20+
:members:

tests/js/roots/titles/relevance.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
class Example:
2+
"""Example class"""
3+
num_attribute = 5
4+
text_attribute = "string"
5+
6+
relevance = "testing"
7+
"""attribute docstring"""
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
Relevance
2+
=========
3+
4+
In some domains, it can be straightforward to determine whether a search result
5+
is relevant to the user's query.
6+
7+
For example, if we are in a software programming language domain, and a user
8+
has issued a query for the term ``printf``, then we could consider a document
9+
in the corpus that describes a built-in language function with the same name
10+
as (highly) relevant. A document that only happens to mention the ``printf``
11+
function name as part of some example code that appears on the page would
12+
also be relevant, but likely less relevant than the one that describes the
13+
function itself in detail.

tests/js/searchtools.js

Lines changed: 78 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,23 @@ describe('Basic html theme search', function() {
77
return req.responseText;
88
}
99

10+
function checkRanking(expectedRanking, results) {
11+
let [nextExpected, ...remainingItems] = expectedRanking;
12+
13+
for (result of results.reverse()) {
14+
if (!nextExpected) break;
15+
16+
let [expectedPage, expectedTitle, expectedTarget] = nextExpected;
17+
let [page, title, target] = result;
18+
19+
if (page == expectedPage && title == expectedTitle && target == expectedTarget) {
20+
[nextExpected, ...remainingItems] = remainingItems;
21+
}
22+
}
23+
24+
expect(remainingItems.length).toEqual(0);
25+
}
26+
1027
describe('terms search', function() {
1128

1229
it('should find "C++" when in index', function() {
@@ -76,7 +93,7 @@ describe('Basic html theme search', function() {
7693
'Main Page',
7794
'',
7895
null,
79-
100,
96+
16,
8097
'index.rst'
8198
]
8299
];
@@ -85,6 +102,66 @@ describe('Basic html theme search', function() {
85102

86103
});
87104

105+
describe('search result ranking', function() {
106+
107+
/*
108+
* These tests should not proscribe precise expected ordering of search
109+
* results; instead each test case should describe a single relevance rule
110+
* that helps users to locate relevant information efficiently.
111+
*
112+
* If you think that one of the rules seems to be poorly-defined or is
113+
* limiting the potential for search algorithm improvements, please check
114+
* for existing discussion/bugreports related to it on GitHub[1] before
115+
* creating one yourself. Suggestions for possible improvements are also
116+
* welcome.
117+
*
118+
* [1] - https://github.com/sphinx-doc/sphinx.git/
119+
*/
120+
121+
it('should score a code module match above a page-title match', function() {
122+
eval(loadFixture("titles/searchindex.js"));
123+
124+
expectedRanking = [
125+
['index', 'relevance', '#module-relevance'], /* py:module documentation */
126+
['relevance', 'Relevance', ''], /* main title */
127+
];
128+
129+
searchParameters = Search._parseQuery('relevance');
130+
results = Search._performSearch(...searchParameters);
131+
132+
checkRanking(expectedRanking, results);
133+
});
134+
135+
it('should score a main-title match above an object member match', function() {
136+
eval(loadFixture("titles/searchindex.js"));
137+
138+
expectedRanking = [
139+
['relevance', 'Relevance', ''], /* main title */
140+
['index', 'relevance.Example.relevance', '#module-relevance'], /* py:class attribute */
141+
];
142+
143+
searchParameters = Search._parseQuery('relevance');
144+
results = Search._performSearch(...searchParameters);
145+
146+
checkRanking(expectedRanking, results);
147+
});
148+
149+
it('should score a main-title match above a subheading-title match', function() {
150+
eval(loadFixture("titles/searchindex.js"));
151+
152+
expectedRanking = [
153+
['relevance', 'Relevance', ''], /* main title */
154+
['index', 'Main Page > Relevance', '#relevance'], /* subsection heading title */
155+
];
156+
157+
searchParameters = Search._parseQuery('relevance');
158+
results = Search._performSearch(...searchParameters);
159+
160+
checkRanking(expectedRanking, results);
161+
});
162+
163+
});
164+
88165
});
89166

90167
describe("htmlToText", function() {

0 commit comments

Comments
 (0)