Skip to content

Conversation

dan-rubinstein
Copy link
Member

This change includes two fixes for the recursive chunking strategy:

  1. Adding logic to merge chunks together (within the maximum chunk size) after splitting a document on a given separator. This reduces the number of chunks generated if splitting document results in many small chunks.
  2. Renamed SeparatorSet to SeparatorGroup to ensure naming is clear for users.

@dan-rubinstein dan-rubinstein added >bug :ml Machine learning Team:ML Meta label for the ML team auto-backport Automatically create backport pull requests when merged v8.19.0 v9.1.0 v9.2.0 labels Jul 11, 2025
@dan-rubinstein dan-rubinstein marked this pull request as ready for review July 11, 2025 14:39
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@elasticsearchmachine
Copy link
Collaborator

Hi @dan-rubinstein, I've created a changelog YAML for you.

@davidkyle davidkyle added >non-issue and removed >bug labels Jul 11, 2025
Copy link
Member

@davidkyle davidkyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dan-rubinstein
Copy link
Member Author

@elasticmachine merge upstream

@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.19
9.1

dan-rubinstein added a commit to dan-rubinstein/elasticsearch that referenced this pull request Jul 14, 2025
…to SeparatorGroup (elastic#131103)

* Adding merging logic to recursive chunking and renaming SeparatorSet to SeparatorGroup

* Update docs/changelog/131103.yaml

* Delete docs/changelog/131103.yaml

---------

Co-authored-by: Elastic Machine <[email protected]>
dan-rubinstein added a commit to dan-rubinstein/elasticsearch that referenced this pull request Jul 14, 2025
…to SeparatorGroup (elastic#131103)

* Adding merging logic to recursive chunking and renaming SeparatorSet to SeparatorGroup

* Update docs/changelog/131103.yaml

* Delete docs/changelog/131103.yaml

---------

Co-authored-by: Elastic Machine <[email protected]>
elasticsearchmachine pushed a commit that referenced this pull request Jul 14, 2025
…to SeparatorGroup (#131103) (#131228)

* Adding merging logic to recursive chunking and renaming SeparatorSet to SeparatorGroup

* Update docs/changelog/131103.yaml

* Delete docs/changelog/131103.yaml

---------

Co-authored-by: Elastic Machine <[email protected]>
dan-rubinstein added a commit that referenced this pull request Jul 14, 2025
…torSet to SeparatorGroup (#131103) (#131227)

* Adding merging logic to recursive chunking and renaming SeparatorSet to SeparatorGroup (#131103)

* Adding merging logic to recursive chunking and renaming SeparatorSet to SeparatorGroup

* Update docs/changelog/131103.yaml

* Delete docs/changelog/131103.yaml

---------

Co-authored-by: Elastic Machine <[email protected]>

* Removing getFirst calls

---------

Co-authored-by: Elastic Machine <[email protected]>
mridula-s109 pushed a commit to mridula-s109/elasticsearch that referenced this pull request Jul 17, 2025
…to SeparatorGroup (elastic#131103)

* Adding merging logic to recursive chunking and renaming SeparatorSet to SeparatorGroup

* Update docs/changelog/131103.yaml

* Delete docs/changelog/131103.yaml

---------

Co-authored-by: Elastic Machine <[email protected]>
mridula-s109 pushed a commit to mridula-s109/elasticsearch that referenced this pull request Jul 17, 2025
…to SeparatorGroup (elastic#131103)

* Adding merging logic to recursive chunking and renaming SeparatorSet to SeparatorGroup

* Update docs/changelog/131103.yaml

* Delete docs/changelog/131103.yaml

---------

Co-authored-by: Elastic Machine <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged :ml Machine learning >non-issue Team:ML Meta label for the ML team v8.19.0 v9.1.0 v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants