[ML] Add internal action to return the Rerank window size #132169

davidkyle · 2025-07-30T10:43:46Z

The internal action is given an inference Id and returns the max number of words for a rerank request. Initially either 250 or 500 words is returned but the logic can be enhanced and tailored for each inference service.

A new RerankingInferenceService interface is defined to expose the window size, all services that support rerank must implement this interface. To check that this is the case all inference service unit tests now extend InferenceServiceTestCase and there is a check that if the service supports the RERANK task type then is must also implement RerankingInferenceService

Summarising the window sizes implemented in this PR

Service	Model	Context Window Size Tokens	Context Window Size Words (0.75 tokens per word)	Rerank Window Size Words
Alibaba	mGTE Models	8192	6144	5500
Azure AI Studio	-	unknown	-	300
Cohere	any	4096	3072	2800
Custom Service	-	unknown	-	300
Elasticsearch	rerank	512	384	300
Google Vertex AI	-003 models	512	384	300
Google Vertex AI	-004 models	1024	768	600
Hugging Face	-	unknown	-	300
Jina AI	any	8000	6000	5500
SageMaker	-	unknown	-	300
Voyage AI	rerank-lite-1	4000	3000	2800
Voyage AI	any other	8000	6000	5500

# Conflicts: # server/src/main/java/org/elasticsearch/inference/InferenceService.java

# Conflicts: # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/sagemaker/SageMakerServiceTests.java

elasticsearchmachine · 2025-07-30T10:44:11Z

Pinging @elastic/ml-core (Team:ML)

...ugin/core/src/main/java/org/elasticsearch/xpack/core/inference/action/GetRerankerAction.java

...rence/src/main/java/org/elasticsearch/xpack/inference/action/TransportGetRerankerAction.java

dan-rubinstein · 2025-07-31T15:45:49Z

...a/org/elasticsearch/xpack/inference/services/elasticsearch/ElasticsearchInternalService.java

+    @Override
+    public int rerankerWindowSize(String modelId) {
+        // TODO rerank chunking should use the same value
+        return RerankingInferenceService.CONSERVATIVE_DEFAULT_WINDOW_SIZE;


Is this an accurate value for the elastic reranker? I believe it has 512 max token count which is ~683 words assuming 0.75 tokens/word for English text.

Correct, also when I tested snippet extraction using the highlighter the sweet spot was around 2560 characters. I worry this might be too low.

0.75 tokens/word equates to roughly 384 words, the conservative default of 250 is low but it definitly avoids truncation. 300 words should be ok if not higher.

server/src/main/java/org/elasticsearch/inference/RerankingInferenceService.java

...st/java/org/elasticsearch/xpack/inference/services/elastic/ElasticInferenceServiceTests.java

...ternalClusterTest/java/org/elasticsearch/xpack/inference/integration/RerankWindowSizeIT.java

kderusso

Thanks for adding this API so quickly! I have some questions/concerns about the defaults.

...ugin/src/main/java/org/elasticsearch/xpack/inference/mock/TestRerankingServiceExtension.java

kderusso · 2025-08-04T13:10:23Z

...org/elasticsearch/xpack/inference/services/alibabacloudsearch/AlibabaCloudSearchService.java

+        // Alibaba's mGTE models support long context windows of up to 8192 tokens.
+        // Using 1 token = 0.75 words, this translates to approximately 6144 words.
+        // https://huggingface.co/Alibaba-NLP/gte-multilingual-reranker-base
+        return 5000;


Why do we set this so much lower than the actual token size? Is it a safety concern?

I picked low values that definitly wouldn't truncate, but yes there is probably room to safely increase this value. 6000 is too close to the approx 6144 word limit, how about 5500?

Ultimately we may want to make this option configurable but for the best user experience something that works out of the box is required. Also as a next step this setting should be exposed as part of the endpoint configurations so users can see what the rerank chunk sizes are.

kderusso · 2025-08-04T13:11:25Z

...a/org/elasticsearch/xpack/inference/services/elasticsearch/ElasticsearchInternalService.java

+    @Override
+    public int rerankerWindowSize(String modelId) {
+        // TODO rerank chunking should use the same value
+        return RerankingInferenceService.CONSERVATIVE_DEFAULT_WINDOW_SIZE;


Correct, also when I tested snippet extraction using the highlighter the sweet spot was around 2560 characters. I worry this might be too low.

davidkyle added 6 commits July 30, 2025 10:04

Add internal get reranker size action

b2d6dd2

test case

00c8286

wip reranker interface

3004d7c

# Conflicts: # server/src/main/java/org/elasticsearch/inference/InferenceService.java

Add rerank to the unit tests

d5ebe19

# Conflicts: # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/sagemaker/SageMakerServiceTests.java

Integ test

11e8aad

fix the tests

7950536

davidkyle added >refactoring :ml Machine learning v9.2.0 labels Jul 30, 2025

elasticsearchmachine added the Team:ML Meta label for the ML team label Jul 30, 2025

[CI] Auto commit changes from spotless

539f468

davidkyle added the :ml/Chunking label Jul 30, 2025

dan-rubinstein reviewed Jul 31, 2025

View reviewed changes

davidkyle added 3 commits August 1, 2025 11:57

address review comments

91646ed

per service logic

1fd8732

Merge branch 'main' into rerank-window-size-action

6d70792

dan-rubinstein approved these changes Aug 1, 2025

View reviewed changes

...ternalClusterTest/java/org/elasticsearch/xpack/inference/integration/RerankWindowSizeIT.java Show resolved Hide resolved

kderusso reviewed Aug 4, 2025

View reviewed changes

davidkyle added 6 commits August 14, 2025 12:42

Merge branch 'main' into rerank-window-size-action

0ba789f

Fix tests

d2c3c03

action renamed

545168b

Merge branch 'main' into rerank-window-size-action

6c19f4f

Merge branch 'main' into rerank-window-size-action

1cb529b

Increase values

07ed01a

davidkyle enabled auto-merge (squash) August 26, 2025 09:46

davidkyle merged commit 0b70308 into elastic:main Aug 26, 2025
33 checks passed

davidkyle mentioned this pull request Aug 28, 2025

[INFERENCE API] Validate the max chunk size against the inference service and model. #133724

Open

[ML] Add internal action to return the Rerank window size #132169

[ML] Add internal action to return the Rerank window size #132169

Uh oh!

Conversation

davidkyle commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Jul 30, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dan-rubinstein Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

kderusso Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

davidkyle Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kderusso left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kderusso Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

davidkyle Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

kderusso Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

davidkyle commented Jul 30, 2025 •

edited

Loading