-
Notifications
You must be signed in to change notification settings - Fork 25.5k
[DOCS] Fix reranking IA, move retrievers to search api overview #112949
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 6 commits
1c91bb3
1abad38
1bc2a29
72359f2
ba1027d
42e6042
ff7106b
e815f94
9f4d508
a351297
babc1a0
ef76874
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,71 @@ | ||
[[re-ranking-overview]] | ||
= Re-ranking | ||
|
||
Many search systems are built on two-stage retrieval pipelines. | ||
|
||
The first stage uses cheap, fast algorithms to find a subset of possible matches. | ||
leemthompo marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
The second stage uses a more powerful model, often machine learning-based, to reorder the documents. | ||
This second step is called re-ranking. | ||
This approach balances computational costs, because the resource-intensive model is only applied to a smaller set of pre-filtered results. | ||
leemthompo marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
{es} supports various ranking and re-ranking techniques to optimize search relevance and performance. | ||
kderusso marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
[float] | ||
[[re-ranking-two-stage-pipeline]] | ||
== Two-stage retrieval pipelines | ||
|
||
Learn about retrieval pipelines and how re-ranking can be applied to your existing search experience. | ||
leemthompo marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
leemthompo marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
[float] | ||
[[re-ranking-first-stage-pipeline]] | ||
=== First stage: initial retrieval | ||
|
||
[float] | ||
[[re-ranking-ranking-overview-bm25]] | ||
==== Full-text search: BM25 scoring | ||
leemthompo marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
{es} ranks documents based on term frequency and inverse document frequency, adjusted for document length. | ||
BM25 is the default statistical scoring algorithm in {es} and works out-of-the-box. | ||
leemthompo marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
[float] | ||
[[re-ranking-ranking-overview-vector]] | ||
==== Vector search: similarity scoring | ||
|
||
Vector search involves transforming data into dense or sparse vector embeddings to capture semantic meanings, and computing similarity scores for query vectors. | ||
Store vectors using `semantic` fields for automatic vectorization or `dense_vector` and `sparse_vector` fields when you need more control over the underlying embedding model. | ||
leemthompo marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
Query vectors with `semantic` queries or `knn` and `sparse_vector` queries to compute similarity scores. | ||
leemthompo marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
Refer to <<semantic-search,semantic search>> for more information. | ||
|
||
[float] | ||
[[re-ranking-ranking-overview-hybrid]] | ||
==== Hybrid techniques | ||
|
||
Hybrid search techniques combine results from full-text and vector search pipelines. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not sure how much we have here, but we've been getting more and more questions and feedback on people who want more information on linear boosting as well. I wonder if a sentence here and a link to somewhere else (if we have it) would be worth doing. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In future we want to have fuller docs sections for:
I think we'll want to revisit that when the time comes. The goal here is to put LTR and semantic reranking in a smarter IA, first and foremost, not to be the exhaustive source of truth for all these topics. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure sure, push back on my scope creep 😀 |
||
{es} enables combining lexical matching (BM25) and vector search scores using the advanced <<rrf,Reciprocal Rank Fusion (RRF)>> algorithm. | ||
leemthompo marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
[float] | ||
[[re-ranking-overview-second-stage]] | ||
=== Second Stage: Re-ranking | ||
leemthompo marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
When using the following advanced re-ranking pipelines, first-stage retrieval mechanisms effectively generate a set of candidates. | ||
These candidates are funneled into the re-ranker to perform more computationally expensive re-ranking tasks. | ||
|
||
[float] | ||
[[re-ranking-overview-semantic]] | ||
==== Semantic re-ranking | ||
|
||
<<semantic-reranking>> uses machine learning models to reorder search results based on their semantic similarity to a query. | ||
Models can be hosted directly in your {es} cluster, or you can use <<inference-apis,inference endpoints>> to call models provided by third-party services. | ||
Enables out-of-the-box semantic search capabilities on existing full-text search indices. | ||
leemthompo marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
[float] | ||
[[re-ranking-overview-ltr]] | ||
==== Learning to Rank (LTR) | ||
|
||
<<learning-to-rank>> is for advanced users. | ||
Train a machine learning model to build a ranking function for your search experience that updates over time. | ||
Best suited for when you have ample training data and need highly customized relevance tuning. | ||
leemthompo marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
include::semantic-reranking.asciidoc[] | ||
include::learning-to-rank.asciidoc[] |
This file was deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm noticing some inconsistency - sometimes we use
Reranking
and other times we useRe-ranking
. I slightly preferreranking
but am fine with either, though we should do a pass for consistency.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
realized that in course of this PR and tried to align on hyphenated version, will need to update the semantic re-reranking page too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be good after e815f94
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I also prefer
reranking
but the hyphenated version seems more au courant :)