-
Couldn't load subscription status.
- Fork 25.6k
Text similarity reranker chunks and scores snippets #133576
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Text similarity reranker chunks and scores snippets #133576
Conversation
…s in text similarity reranker
|
Pinging @elastic/search-relevance (Team:Search - Relevance) |
|
Hi @kderusso, I've created a changelog YAML for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a very elegant solution @kderusso , I like where this is going.
I made some comment about the API, since we adopt chunking explicitly I think we need to remove the confusion with snippets.
The API could look like this:
{
"text_similarity_reranker": {
"inference_id": "my_rerank_inference_id",
"field": "semantic_text_field",
"chunk_scorer": {
"top": 2,
"chunking_settings": {
"strategy": "sentence",
"max_chunk_size": 128,
"sentence_overlap": 0
}
}
}
}
We can also say that:
{
"text_similarity_reranker": {
"inference_id": "",
"field": "semantic_text_field",
"chunk_scorer": {
"top": 2,
"chunking_settings": {
"max_chunk_size": 128
}
}
}
}
is valid with good default for all fields as long as chunk_scorer is provided.
So something like:
{
"text_similarity_reranker": {
"inference_id": "",
"field": "semantic_text_field",
"chunk_scorer": {
"top": 2
}
}
}
is valid too.
...ck/plugin/core/src/main/java/org/elasticsearch/xpack/core/common/snippets/SnippetScorer.java
Outdated
Show resolved
Hide resolved
...ck/plugin/core/src/main/java/org/elasticsearch/xpack/core/common/snippets/SnippetScorer.java
Outdated
Show resolved
Hide resolved
...ck/plugin/core/src/main/java/org/elasticsearch/xpack/core/common/snippets/SnippetScorer.java
Outdated
Show resolved
Hide resolved
...rg/elasticsearch/xpack/inference/rank/textsimilarity/TextSimilarityRankRetrieverBuilder.java
Show resolved
Hide resolved
...rg/elasticsearch/xpack/inference/rank/textsimilarity/TextSimilarityRankRetrieverBuilder.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @kderusso , I left some nits about the API.
We should also remove the feature flag!
...on-rest/src/yamlRestTest/java/org/elasticsearch/test/rest/yaml/CcsCommonYamlTestSuiteIT.java
Outdated
Show resolved
Hide resolved
...in/core/src/main/java/org/elasticsearch/xpack/core/common/chunks/MemoryIndexChunkScorer.java
Outdated
Show resolved
Hide resolved
...re/src/test/java/org/elasticsearch/xpack/core/common/chunks/MemoryIndexChunkScorerTests.java
Show resolved
Hide resolved
...rg/elasticsearch/xpack/inference/rank/textsimilarity/TextSimilarityRankRetrieverBuilder.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
@elasticmachine update branch |
|
@elasticmachine update branch |
|
@elasticmachine run elasticsearch-ci/pr-upgrade-part-2 |
This reverts commit d0813b2.
… the test failures then :table-flip:
elastic/elasticsearch#133576 introduced the concept of a `chunk_rescorer` in the `text_similarity_reranker` retriever. This PR adds the chunk rescorer to Dev Tools AutoComplete. https://github.com/user-attachments/assets/9688a4ce-2cd5-40ad-ba8e-e8e69abed26e --------- Co-authored-by: kibanamachine <[email protected]>
elastic/elasticsearch#133576 introduced the concept of a `chunk_rescorer` in the `text_similarity_reranker` retriever. This PR adds the chunk rescorer to Dev Tools AutoComplete. https://github.com/user-attachments/assets/9688a4ce-2cd5-40ad-ba8e-e8e69abed26e --------- Co-authored-by: kibanamachine <[email protected]>
This is a followup to #129369, but instead of using the highlighter to create snippets we chunk and score field content directly.
Allows users to configure the chunk sizes we use to generate snippets in the following ways:
chunking_settingsin the requestchunk_sizein the request, which we use to construct a default chunking settings with that sizeExample API calls:
NOTE: