[ES|QL] Add CHUNK function #134320

kderusso · 2025-09-08T18:37:34Z

Adds a new function, CHUNK that takes text from a field and returns chunks based on the requested chunking strategy.

For this PR, we're inputting a size which will correspond to the default number of words in a sentence based chunking strategy. Future planned PRs will include the support for explicit chunking settings or an inference ID on top of these defaults. Future optimizations could also include supporting a max chunk size of LIMIT and optimizations to semantic text fields.

Examples of how to call this function:

FROM wikipedia
 | WHERE MATCH(content, \"churchill\") 
 | EVAL chunks = chunk(content, {"num_chunks": 3, "chunk_size": 20}) 
 | MV_EXPAND chunks
 | KEEP chunks
 | LIMIT 10

FROM wikipedia
 | WHERE MATCH(content, \"churchill\") 
 | EVAL chunks = chunk(content) 
 | MV_EXPAND chunks
 | KEEP chunks
 | LIMIT 10

github-actions · 2025-10-22T17:17:00Z

🔍 Preview links for changed docs

docs/reference/query-languages/esql/kibana/docs/functions/chunk.md

github-actions · 2025-10-22T17:17:01Z

ℹ️ Important: Docs version tagging

👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version.

We use applies_to tags to mark version-specific features and changes.

Expand for a quick overview

When to use applies_to tags:

✅ At the page level to indicate which products/deployments the content applies to (mandatory)
✅ When features change state (e.g. preview, ga) in a specific version
✅ When availability differs across deployments and environments

What NOT to do:

❌ Don't remove or replace information that applies to an older version
❌ Don't add new information that applies to a specific version without an applies_to tag
❌ Don't forget that applies_to tags can be used at the page, section, and inline level

🤔 Need help?

Check out the cumulative docs guidelines
Reach out in the #docs Slack channel

elasticsearchmachine · 2025-10-22T17:30:04Z

Hi @kderusso, I've created a changelog YAML for you.

elasticsearchmachine · 2025-10-22T17:34:06Z

Pinging @elastic/es-search-relevance (Team:Search Relevance)

carlosdelest

This looks really good! 💯

A couple of minor issues on validation and testing.

I think it would be worth adding a VerifierTests to ensure we catch nulls on the numeric params. I believe CSV tests are getting all other testing I can think of 👍

...esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/scalar/string/Chunk.java

carlosdelest · 2025-10-23T11:46:44Z

x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/CsvTests.java

                testCase.requiredCapabilities.contains(EsqlCapabilities.Cap.MULTI_MATCH_FUNCTION.capabilityName())
            );
+            assumeFalse(
+                "CSV tests cannot currently handle CHUNK function",


Why is this needed? Are we using something specific from Lucene on the chunker that will make this not to work?

I can run the CSV tests via the integration spec test, but CsvTest returns null for all chunked input. Perhaps this is due to the fact that we're using the chunker? I haven't been able to debug the actual CSV test implementation.

...esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/scalar/string/Chunk.java

x-pack/plugin/esql/qa/testFixtures/src/main/resources/chunk.csv-spec

carlosdelest

LGTM 👍

Minor comments, I think we can simplify the options handling on evaluation.

...esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/scalar/string/Chunk.java

carlosdelest · 2025-10-29T15:39:28Z

x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/CsvTests.java

            );
+            assumeFalse(
+                "CSV tests cannot currently handle CHUNK function",
+                testCase.requiredCapabilities.contains(EsqlCapabilities.Cap.CHUNK_FUNCTION.capabilityName())


I'm still confused about why this doesn't work 🤔 . We can do that in a follow up, and we have the EsqlSpecIT fields - but it would be nice to be able to run CSV tests on this.

There's a good chance that we won't do it as a followup may entail also performing a Lucene query - TBD.

* Add new function to chunk strings * Refactor CHUNK function to support multiple values * Default to returning all chunks * [CI] Auto commit changes from spotless * Handle warnings * Loosen export restrictions to try to get compile error working * Remove inference dependencies * Fix compilation errors * Remove more inference deps * Fix compile errors from merge * Fix existing tests * Exclude from CSV tests * Add more tests * Cleanup * [CI] Auto commit changes from spotless * Cleanup * Update docs/changelog/134320.yaml * PR feedback * Remove null field constraint * [CI] Auto commit changes from spotless * PR feedback: Refactor to use an options map * Cleanup * Regenerate docs * Add test on a concatenated field * Add multivalued field test * Don't hardcode strings * [CI] Auto commit changes from spotless * PR feedback --------- Co-authored-by: elasticsearchmachine <[email protected]>

kderusso added 3 commits September 3, 2025 16:16

Add new function to chunk strings

98739d7

Refactor CHUNK function to support multiple values

6ae1cdc

Default to returning all chunks

1f4342c

elasticsearchmachine added the v9.2.0 label Sep 8, 2025

kderusso changed the title ~~Kderusso/esql chunk function~~ [ES|QL] Add CHUNK function Sep 8, 2025

elasticsearchmachine and others added 3 commits September 8, 2025 18:44

[CI] Auto commit changes from spotless

528c12c

Handle warnings

04307f2

Loosen export restrictions to try to get compile error working

66a13bb

elasticsearchmachine added v9.3.0 and removed v9.2.0 labels Oct 2, 2025

kderusso added 9 commits October 13, 2025 14:43

Merge main

6b3191e

Remove inference dependencies

693ea01

Fix compilation errors

fde0368

Remove more inference deps

a70d5b1

Merge update from main

99894b4

Fix compile errors from merge

de9ddae

Fix existing tests

d302fdd

Exclude from CSV tests

ec456c6

Add more tests

abeb725

kderusso and others added 3 commits October 22, 2025 13:19

Cleanup

46279f0

[CI] Auto commit changes from spotless

90deac7

Cleanup

7c78c32

kderusso added >enhancement :Search Relevance/ES|QL Search functionality in ES|QL labels Oct 22, 2025

Update docs/changelog/134320.yaml

9c9b373

kderusso marked this pull request as ready for review October 22, 2025 17:33

elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Oct 22, 2025

kderusso requested a review from a team October 22, 2025 17:34

carlosdelest reviewed Oct 23, 2025

View reviewed changes

PR feedback

db585e6

kderusso requested a review from carlosdelest October 23, 2025 17:51

kderusso and others added 2 commits October 23, 2025 14:39

Remove null field constraint

c7edb73

[CI] Auto commit changes from spotless

c435bf4

ioanatia reviewed Oct 23, 2025

View reviewed changes

kderusso added 6 commits October 24, 2025 16:09

PR feedback: Refactor to use an options map

84b8400

Cleanup

2ae2989

Regenerate docs

114c0da

Add test on a concatenated field

45c517a

Add multivalued field test

7808118

Merge from main

dafee14

kderusso requested a review from ioanatia October 27, 2025 13:57

kderusso and others added 2 commits October 28, 2025 14:03

Don't hardcode strings

f0a23f0

[CI] Auto commit changes from spotless

6a1da0e

carlosdelest approved these changes Oct 29, 2025

View reviewed changes

kderusso added 2 commits October 29, 2025 14:44

PR feedback

ee4d532

Merge from main

b53b738

kderusso enabled auto-merge (squash) October 29, 2025 20:23

[CI] Auto commit changes from spotless

d12852c

kderusso merged commit 03bc16c into elastic:main Oct 29, 2025
34 checks passed

[ES|QL] Add CHUNK function #134320

[ES|QL] Add CHUNK function #134320

Conversation

kderusso commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔍 Preview links for changed docs

Uh oh!

github-actions bot commented Oct 22, 2025

ℹ️ Important: Docs version tagging

When to use applies_to tags:

What NOT to do:

🤔 Need help?

Uh oh!

elasticsearchmachine commented Oct 22, 2025

Uh oh!

elasticsearchmachine commented Oct 22, 2025

Uh oh!

carlosdelest left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

carlosdelest Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

kderusso Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

carlosdelest left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

carlosdelest Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

kderusso Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kderusso commented Sep 8, 2025 •

edited

Loading

github-actions bot commented Oct 22, 2025 •

edited

Loading