-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Improve relevance scoring for titles and object-name matches in search results #12441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 24 commits
fbb62cf
96e2894
a2a4b60
cb0f6e7
75eaf81
0f0624e
5eaea64
afb1685
5a5e271
5c106c8
7418a71
17367eb
96526a9
6c3ffa2
fd36010
c259b1c
7f8a5f6
2f0cbe1
bf576cd
d1a7197
5d5b079
e9bdf2f
e75891e
d5d8717
388aef3
4d819cc
9d59eaf
7b772a2
4e07078
1b42e80
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -328,13 +328,14 @@ const Search = { | |||||||||
| for (const [title, foundTitles] of Object.entries(allTitles)) { | ||||||||||
| if (title.toLowerCase().trim().includes(queryLower) && (queryLower.length >= title.length/2)) { | ||||||||||
| for (const [file, id] of foundTitles) { | ||||||||||
| let score = Math.round(100 * queryLower.length / title.length) | ||||||||||
| let score = Math.round(Scorer.title * queryLower.length / title.length); | ||||||||||
| let boost = titles[file] === title ? 1 : 0; // add a boost for document titles | ||||||||||
|
||||||||||
| let score = Math.round(Scorer.title * queryLower.length / title.length); | |
| let boost = titles[file] === title ? 1 : 0; // add a boost for document titles | |
| const score = Math.round(Scorer.title * queryLower.length / title.length); | |
| const boost = titles[file] === title ? 1 : 0; // add a small boost for document titles |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thinkg it's better using a const as well. But on a second thought, I'm wondering whether a +1 is sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously a title and subsection title with the same text would have equal scores, leaving their relative ranking undefined.
Any positive value here should have the effect of elevating the main-document titles above same-named subsection titles in the search results.
A single-integer increment is used because ideally we don't want the main document titles to move up in the rankings 'too much' and overtake other matches. That is possible, though, especially given that some scores are fractional. So I have the opposite worry: that +1 might be too much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(a good way to figure these out could be to develop counterexamples and add test cases for them)
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| import os | ||
| import sys | ||
|
|
||
| sys.path.insert(0, os.path.abspath('.')) | ||
|
|
||
| extensions = ['sphinx.ext.autodoc'] |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,20 @@ | ||
| Main Page | ||
| ========= | ||
|
|
||
| This is the main page of the ``titles`` test project. | ||
|
|
||
| In particular, this test project is intended to demonstrate how Sphinx | ||
| can handle scoring of query matches against document titles and subsection | ||
| heading titles relative to other document matches such as terms found within | ||
| document text and object names extracted from code. | ||
|
|
||
| Relevance | ||
| --------- | ||
|
|
||
| In the context of search engines, we can say that a document is **relevant** | ||
| to a user's query when it contains information that seems likely to help them | ||
| find an answer to a question they're asking, or to improve their knowledge of | ||
| the subject area they're researching. | ||
|
|
||
| .. automodule:: relevance | ||
| :members: |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,4 @@ | ||
| class Example: | ||
| """Example class""" | ||
| num_attribute = 5 | ||
| text_attribute = "string" | ||
jayaddison marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,13 @@ | ||
| Relevance | ||
| ========= | ||
|
|
||
| In some domains, it can be straightforward to determine whether a search result | ||
| is relevant to the user's query. | ||
|
|
||
| For example, if we are in a software programming language domain, and a user | ||
| has issued a query for the term ``printf``, then we could consider a document | ||
| in the corpus that describes a built-in language function with the same name | ||
| as (highly) relevant. A document that only happens to mention the ``printf`` | ||
| function name as part of some example code that appears on the page would | ||
| also be relevant, but likely less relevant than the one that describes the | ||
| function itself in detail. |
Uh oh!
There was an error while loading. Please reload this page.