Removing elastic reranker chunking feature flag and allow return_documents to be false #136045

dan-rubinstein · 2025-10-06T17:40:39Z

This change removes the feature flag for the elastic reranker chunking configuration settings. This will enable it in serverless and will remove the unnecessary code from the 9.2 branch in a backport as well.

This change also adds functionality to allow users to set return_documents to false in the task settings to not return document data. There is an existing bug in the ElasticsearchInternalService which still returns documents if the user sets return_documents to false when creating the inference endpoint instead of when calling the perform inference API. I've created an issue to address this to minimize the size of this PR and to look into it a bit more and make sure nothing breaks if we change this behavior.

elasticsearchmachine · 2025-10-07T19:11:30Z

Pinging @elastic/ml-core (Team:ML)

...a/org/elasticsearch/xpack/inference/services/elasticsearch/ElasticsearchInternalService.java

DonalEvans · 2025-10-07T21:35:18Z

...ence/src/test/java/org/elasticsearch/xpack/inference/chunking/RerankRequestChunkerTests.java

            assertEquals(1, rankedDocResults.getRankedDocs().size());
-        }, e -> fail("Expected successful parsing but got failure: " + e)));
+            if (returnDocuments) {
+                assertNotNull(rankedDocResults.getRankedDocs().get(0).text());


Rather than just asserting that the text is non-null in this case, would it be better to assert that it matches the expected text:

assertThat(rankedDocResults.getRankedDocs().get(0).text(), is(inputs.getFirst()));

This is related to the comment below. We can discuss it in the comments there.

DonalEvans · 2025-10-07T21:37:41Z

...ence/src/test/java/org/elasticsearch/xpack/inference/chunking/RerankRequestChunkerTests.java

+            if (returnDocuments) {
+                assertNotNull(rankedDocResults.getRankedDocs().get(0).text());
+            } else {
+                assertNull(rankedDocResults.getRankedDocs().get(0).text());
+            }


These asserts are redundant, since we already assert that the actual results match the expected results on line 165.

I'll remove these assertions.

When constructing the expected RankedDoc on line 163, we're setting the text to be the expected value (either the first input or null) based on whether returnDocuments is true, so we would be able to tell if the document string was returned when doing the comparison:

new RankedDocsResults.RankedDoc(0, max(relevanceScore1, relevanceScore2), returnDocuments ? inputs.get(0) : null)

DonalEvans · 2025-10-07T21:55:59Z

...ence/src/test/java/org/elasticsearch/xpack/inference/chunking/RerankRequestChunkerTests.java

+            if (returnDocuments) {
+                rankedDocResults.getRankedDocs().forEach(r -> { assertNotNull(r.text()); });
+            } else {
+                rankedDocResults.getRankedDocs().forEach(r -> { assertNull(r.text()); });
+            }


I wonder if it might be possible to assert on the actual value of the text in these tests instead of just whether or not it's null. If we used fixed values for the relevance scores in the ranked docs instead of random ones, then the order of the results would be deterministic and we could know which doc was expected to have which text.

It would also be good to have the inputs list elements not be identical, since if they're the same, then we have no way of making sure that the text is correct when comparing between the two documents. With the test as it is, if there was some weird bug which caused the text from one result to be copied to another result, we would have no way of spotting that.

The purpose of randomizing the values of the relevance score is to make sure we always properly sort the values at the end and that we always take the highest chunk score per document. The underlying unit being tested shouldn't care whether the scores are in a specific order so I figured randomizing the values would help us test this.

As for the input strings I think we can make 2 changes to make the tests more robust.

Make the input strings different per document to avoid accidentally generating 2 identical ones when passing in 2 documents.

Modify the assertions to specifically check that the correct document string was returned.

Let me know if this makes sense to you.

As long as we can assert that the results contain the correct string, then using random relevance scores is fine, I think.

Ideally, I think that a unit test should test one thing at a time, since that helps pinpoint the specific area that has a bug in the event of a test failure. For example, a test that the list returned by rankedDocResults.getRankedDocs() is sorted by relevance score regardless of the returnDocuments value would use non-random relevance scores for the inputs so that the expected output is fixed, so we don't have to effectively reimplement the sorting logic in the test in order to validate the output. This is just my personal philosophy when it comes to unit testing though, so not something that needs to be changed in this PR.

davidkyle

LGTM if @DonalEvans suggestions can be incorporated

davidkyle

LGTM

dan-rubinstein · 2025-10-08T16:57:10Z

@elasticmachine merge upstream

dan-rubinstein · 2025-10-08T17:42:23Z

@elasticmachine merge upstream

…ments to be false (elastic#136045) * Removing elastic reranker chunking feature flag * Allow return_documents to be set to false * Updating unit tests to verify returned document strings --------- Co-authored-by: Elastic Machine <[email protected]>

elasticsearchmachine · 2025-10-08T18:51:31Z

💚 Backport successful

Status	Branch	Result
✅	9.2

…ments to be false (#136045) (#136222) * Removing elastic reranker chunking feature flag * Allow return_documents to be set to false * Updating unit tests to verify returned document strings --------- Co-authored-by: Elastic Machine <[email protected]>

Removing elastic reranker chunking feature flag

f58e554

dan-rubinstein added >non-issue :ml Machine learning backport Team:ML Meta label for the ML team v9.2.0 v9.3.0 labels Oct 6, 2025

Allow return_documents to be set to false

5f08296

dan-rubinstein changed the title ~~Removing elastic reranker chunking feature flag~~ Removing elastic reranker chunking feature flag and allow return_documents to be false Oct 7, 2025

dan-rubinstein added auto-backport Automatically create backport pull requests when merged and removed backport labels Oct 7, 2025

DonalEvans reviewed Oct 7, 2025

View reviewed changes

davidkyle approved these changes Oct 8, 2025

View reviewed changes

Updating unit tests to verify returned document strings

a647e69

dan-rubinstein requested a review from davidkyle October 8, 2025 16:27

DonalEvans approved these changes Oct 8, 2025

View reviewed changes

davidkyle approved these changes Oct 8, 2025

View reviewed changes

Merge branch 'main' into remove-reranker-chunking-ff

7416998

Merge branch 'main' into remove-reranker-chunking-ff

c4f9cd7

dan-rubinstein merged commit abb7c29 into elastic:main Oct 8, 2025
34 checks passed

dan-rubinstein mentioned this pull request Oct 8, 2025

[9.2] Removing elastic reranker chunking feature flag and allow return_documents to be false (#136045) #136222

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Removing elastic reranker chunking feature flag and allow return_documents to be false #136045

Removing elastic reranker chunking feature flag and allow return_documents to be false #136045

Uh oh!

dan-rubinstein commented Oct 6, 2025 •

edited

Loading

Uh oh!

elasticsearchmachine commented Oct 7, 2025

Uh oh!

Uh oh!

DonalEvans Oct 7, 2025

Uh oh!

dan-rubinstein Oct 8, 2025

Uh oh!

DonalEvans Oct 7, 2025

Uh oh!

dan-rubinstein Oct 8, 2025

Uh oh!

DonalEvans Oct 8, 2025

Uh oh!

DonalEvans Oct 7, 2025

Uh oh!

dan-rubinstein Oct 8, 2025 •

edited

Loading

Uh oh!

DonalEvans Oct 8, 2025

Uh oh!

davidkyle left a comment

Uh oh!

davidkyle left a comment

Uh oh!

dan-rubinstein commented Oct 8, 2025

Uh oh!

dan-rubinstein commented Oct 8, 2025

Uh oh!

Uh oh!

elasticsearchmachine commented Oct 8, 2025

Uh oh!

Uh oh!

Removing elastic reranker chunking feature flag and allow return_documents to be false #136045

Removing elastic reranker chunking feature flag and allow return_documents to be false #136045

Uh oh!

Conversation

dan-rubinstein commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Oct 7, 2025

Uh oh!

Uh oh!

DonalEvans Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

dan-rubinstein Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

DonalEvans Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

dan-rubinstein Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

DonalEvans Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

DonalEvans Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

dan-rubinstein Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DonalEvans Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

davidkyle left a comment

Choose a reason for hiding this comment

Uh oh!

davidkyle left a comment

Choose a reason for hiding this comment

Uh oh!

dan-rubinstein commented Oct 8, 2025

Uh oh!

dan-rubinstein commented Oct 8, 2025

Uh oh!

Uh oh!

elasticsearchmachine commented Oct 8, 2025

💚 Backport successful

Uh oh!

Uh oh!

dan-rubinstein commented Oct 6, 2025 •

edited

Loading

dan-rubinstein Oct 8, 2025 •

edited

Loading