-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Support configurable chunking in semantic_text fields #121041
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
kderusso
merged 99 commits into
elastic:main
from
kderusso:kderusso/support-configurable-chunking
Apr 3, 2025
Merged
Changes from 97 commits
Commits
Show all changes
99 commits
Select commit
Hold shift + click to select a range
9f4e2ad
test
kderusso a561017
Revert "test"
kderusso acb14a2
Refactor InferenceService to allow passing in chunking settings
kderusso 933c74d
Add chunking config to inference field metadata and store in semantic…
kderusso 6b11f01
Fix test compilation errors
kderusso 7c4ba49
Hacking around trying to get ingest to work
kderusso edbfde3
Debugging
kderusso c255745
[CI] Auto commit changes from spotless
889875f
POC works and update TODO to fix this
kderusso c87753e
[CI] Auto commit changes from spotless
c2c9a52
Refactor chunking settings from model settings to field inference req…
kderusso c9bfa32
A bit of cleanup
kderusso 122aeee
Revert a bunch of changes to try to narrow down what broke CI
kderusso b42f262
test
kderusso 75404b5
Revert "test"
kderusso 0dcadfa
Merge from main
kderusso a54804e
Fix InferenceFieldMetadataTest
kderusso 675819c
[CI] Auto commit changes from spotless
ccef5cc
Add chunking settings back in
kderusso 70ac065
Update builder to use new map
kderusso 4ac2460
Merge from main
kderusso 20d596c
Fix compilation errors after merge
kderusso 178a9db
Debugging tests
kderusso 7f1e99d
debugging
kderusso f74c803
Merge branch 'main' into kderusso/support-configurable-chunking
kderusso 65acf8f
Cleanup
kderusso 51d9aae
Add yaml test
kderusso 7aaaee8
Update tests
kderusso d306eae
Add chunking to test inference service
kderusso 7cf7589
Trying to get tests to work
kderusso 89e040d
Shard bulk inference test never specifies chunking settings
kderusso 233defd
Fix test
kderusso 0b2ebf6
Always process batches in order
kderusso a807ab5
Fix chunking in test inference service and yaml tests
kderusso dc48c28
[CI] Auto commit changes from spotless
3ed5631
Refactor - remove convenience method with default chunking settings
kderusso c95789d
Fix ShardBulkInferenceActionFilterTests
kderusso 75031e1
Fix ElasticsearchInternalServiceTests
kderusso d051cd2
Fix SemanticTextFieldMapperTests
kderusso 8913177
[CI] Auto commit changes from spotless
2ab5aec
Fix test data to fit within bounds
kderusso 92d70dc
Add additional yaml test cases
kderusso 1106671
Playing with xcontent parsing
kderusso 11ab0f9
Merge from main
kderusso fb2cc28
A little cleanup
kderusso 1745dc9
Update docs/changelog/121041.yaml
kderusso a2cdc42
Merge from main
kderusso 525eed2
Fix failures introduced by merge
kderusso 45ab0eb
[CI] Auto commit changes from spotless
42c8449
Address PR feedback
kderusso 71edec2
[CI] Auto commit changes from spotless
6a37449
Fix predicate in updated test
kderusso c076f92
Better handling of null/empty ChunkingSettings
kderusso 2ef235c
Update parsing settings
kderusso f739639
Merge from main
kderusso 311d840
Fix errors post merge
kderusso e3e15d2
PR feedback
kderusso 4224159
[CI] Auto commit changes from spotless
ad090d7
PR feedback and fix Xcontent parsing for SemanticTextField
kderusso a8ef9a1
Remove chunking settings check to use what's passed in from sender se…
kderusso db06fd9
Fix some tests
kderusso b5f2929
Cleanup
kderusso fccbd61
Test failure whack-a-mole
kderusso aab16e7
Cleanup
kderusso 1d54066
Merge from main
kderusso 023c227
Refactor to handle memory optimized bulk shard inference actions - th…
kderusso b26b9a2
[CI] Auto commit changes from spotless
1c84cc2
Minor cleanup
kderusso 165c19e
A bit more cleanup
kderusso 735982a
Spotless
kderusso b2839fc
Revert change
kderusso ecc9bb3
Update chunking setting update logic
kderusso 4c992d8
Go back to serializing maps
kderusso c807199
Revert change to model settings - source still errors on missing mode…
kderusso c152281
Fix updating chunking settings
kderusso 561f583
Merge main into kderusso/support-configurable-chunking
kderusso 9907a64
Look up model if null
kderusso 688c637
Fix test
kderusso 7693ef6
Merge main into kderusso/support-configurable-chunking
kderusso fa9247a
Work around https://github.com/elastic/elasticsearch/issues/125723 in…
kderusso 1d8931c
Add BWC tests
kderusso ab7752e
Add chunking_settings to docs
kderusso d2ef735
Merge main into kderusso/support-configurable-chunking
kderusso cd4d32b
Refactor/rename
kderusso 5db7ed4
Address minor PR feedback
kderusso e477639
Add test case for null update
kderusso 7d85fd3
PR feedback - adjust refactor of chunked inputs
kderusso 845a732
Refactored AbstractTestInferenceService to return offsets instead of …
kderusso 2ed86d4
[CI] Auto commit changes from spotless
0a54972
Fix tests where chunk output was of size 3
kderusso 8cf287b
Update mappings per PR feedback
kderusso a9c7512
PR Feedback
kderusso 65b8893
Merge main into branch
kderusso 546e333
Fix problems related to merge
kderusso 612d2fb
PR optimization
kderusso 9fb17a6
Fix test
kderusso bf8c460
Delete extra file
kderusso 2b5589d
Merge main into branch
kderusso 1e3fc22
Merge from main
kderusso File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
pr: 121041 | ||
summary: Support configurable chunking in `semantic_text` fields | ||
area: Relevance | ||
type: enhancement | ||
issues: [] |
160 changes: 119 additions & 41 deletions
160
docs/reference/elasticsearch/mapping-reference/semantic-text.md
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
25 changes: 25 additions & 0 deletions
25
server/src/main/java/org/elasticsearch/inference/ChunkInferenceInput.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
/* | ||
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one | ||
* or more contributor license agreements. Licensed under the "Elastic License | ||
* 2.0", the "GNU Affero General Public License v3.0 only", and the "Server Side | ||
* Public License v 1"; you may not use this file except in compliance with, at | ||
* your election, the "Elastic License 2.0", the "GNU Affero General Public | ||
* License v3.0 only", or the "Server Side Public License, v 1". | ||
*/ | ||
|
||
package org.elasticsearch.inference; | ||
|
||
import org.elasticsearch.core.Nullable; | ||
|
||
import java.util.List; | ||
|
||
public record ChunkInferenceInput(String input, @Nullable ChunkingSettings chunkingSettings) { | ||
|
||
public ChunkInferenceInput(String input) { | ||
this(input, null); | ||
} | ||
|
||
public static List<String> inputs(List<ChunkInferenceInput> chunkInferenceInputs) { | ||
return chunkInferenceInputs.stream().map(ChunkInferenceInput::input).toList(); | ||
} | ||
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.