-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[ML] Add internal action to return the Rerank window size #132169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
# Conflicts: # server/src/main/java/org/elasticsearch/inference/InferenceService.java
# Conflicts: # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/sagemaker/SageMakerServiceTests.java
|
Pinging @elastic/ml-core (Team:ML) |
...ugin/core/src/main/java/org/elasticsearch/xpack/core/inference/action/GetRerankerAction.java
Outdated
Show resolved
Hide resolved
...rence/src/main/java/org/elasticsearch/xpack/inference/action/TransportGetRerankerAction.java
Outdated
Show resolved
Hide resolved
...rence/src/main/java/org/elasticsearch/xpack/inference/action/TransportGetRerankerAction.java
Show resolved
Hide resolved
...rence/src/main/java/org/elasticsearch/xpack/inference/action/TransportGetRerankerAction.java
Outdated
Show resolved
Hide resolved
| @Override | ||
| public int rerankerWindowSize(String modelId) { | ||
| // TODO rerank chunking should use the same value | ||
| return RerankingInferenceService.CONSERVATIVE_DEFAULT_WINDOW_SIZE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this an accurate value for the elastic reranker? I believe it has 512 max token count which is ~683 words assuming 0.75 tokens/word for English text.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, also when I tested snippet extraction using the highlighter the sweet spot was around 2560 characters. I worry this might be too low.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0.75 tokens/word equates to roughly 384 words, the conservative default of 250 is low but it definitly avoids truncation. 300 words should be ok if not higher.
server/src/main/java/org/elasticsearch/inference/RerankingInferenceService.java
Outdated
Show resolved
Hide resolved
...st/java/org/elasticsearch/xpack/inference/services/elastic/ElasticInferenceServiceTests.java
Outdated
Show resolved
Hide resolved
...ternalClusterTest/java/org/elasticsearch/xpack/inference/integration/RerankWindowSizeIT.java
Show resolved
Hide resolved
...ternalClusterTest/java/org/elasticsearch/xpack/inference/integration/RerankWindowSizeIT.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this API so quickly! I have some questions/concerns about the defaults.
...ugin/src/main/java/org/elasticsearch/xpack/inference/mock/TestRerankingServiceExtension.java
Outdated
Show resolved
Hide resolved
| // Alibaba's mGTE models support long context windows of up to 8192 tokens. | ||
| // Using 1 token = 0.75 words, this translates to approximately 6144 words. | ||
| // https://huggingface.co/Alibaba-NLP/gte-multilingual-reranker-base | ||
| return 5000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we set this so much lower than the actual token size? Is it a safety concern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I picked low values that definitly wouldn't truncate, but yes there is probably room to safely increase this value. 6000 is too close to the approx 6144 word limit, how about 5500?
Ultimately we may want to make this option configurable but for the best user experience something that works out of the box is required. Also as a next step this setting should be exposed as part of the endpoint configurations so users can see what the rerank chunk sizes are.
| @Override | ||
| public int rerankerWindowSize(String modelId) { | ||
| // TODO rerank chunking should use the same value | ||
| return RerankingInferenceService.CONSERVATIVE_DEFAULT_WINDOW_SIZE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, also when I tested snippet extraction using the highlighter the sweet spot was around 2560 characters. I worry this might be too low.
The internal action is given an inference Id and returns the max number of words for a rerank request. Initially either
250or500words is returned but the logic can be enhanced and tailored for each inference service.A new
RerankingInferenceServiceinterface is defined to expose the window size, all services that support rerank must implement this interface. To check that this is the case all inference service unit tests now extendInferenceServiceTestCaseand there is a check that if the service supports the RERANK task type then is must also implementRerankingInferenceServiceSummarising the window sizes implemented in this PR