Skip to content

Commit 358ada7

Browse files
authored
Add none chunking strategy to disable automatic chunking for inference endpoints (#129150)
This introduces a `none` chunking strategy that disables automatic chunking when using an inference endpoint. It enables users to provide pre-chunked input directly to a `semantic_text` field without any additional splitting. The chunking strategy can be configured either on the inference endpoint or directly in the `semantic_text` field definition. **Example:** ```json PUT test-index { "mappings": { "properties": { "my_semantic_field": { "type": "semantic_text", "chunking_settings": { "strategy": "none" <1> } } } } } ``` <1> Disables automatic chunking on `my_semantic_field`. ```json PUT test-index/_doc/1 { "my_semantic_field": ["my first chunk", "my second chunk", ...] <1> ... } ``` <1> Pre-chunked input provided as an array of strings. Each array element represents a single chunk that will be sent directly to the inference service without further processing.
1 parent 63da93d commit 358ada7

File tree

18 files changed

+396
-15
lines changed

18 files changed

+396
-15
lines changed

docs/changelog/129150.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
pr: 129150
2+
summary: Add `none` chunking strategy to disable automatic chunking for inference
3+
endpoints
4+
area: Machine Learning
5+
type: feature
6+
issues: []

docs/reference/elasticsearch/mapping-reference/semantic-text.md

Lines changed: 50 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -117,15 +117,16 @@ If specified, these will override the chunking settings set in the {{infer-cap}}
117117
endpoint associated with `inference_id`.
118118
If chunking settings are updated, they will not be applied to existing documents
119119
until they are reindexed.
120+
To completely disable chunking, use the `none` chunking strategy.
120121

121122
**Valid values for `chunking_settings`**:
122123

123124
`type`
124-
: Indicates the type of chunking strategy to use. Valid values are `word` or
125+
: Indicates the type of chunking strategy to use. Valid values are `none`, `word` or
125126
`sentence`. Required.
126127

127128
`max_chunk_size`
128-
: The maximum number of works in a chunk. Required.
129+
: The maximum number of words in a chunk. Required for `word` and `sentence` strategies.
129130

130131
`overlap`
131132
: The number of overlapping words allowed in chunks. This cannot be defined as
@@ -136,6 +137,12 @@ until they are reindexed.
136137
: The number of overlapping sentences allowed in chunks. Valid values are `0`
137138
or `1`. Required for `sentence` type chunking settings
138139

140+
::::{warning}
141+
If the input exceeds the maximum token limit of the underlying model, some services (such as OpenAI) may return an
142+
error. In contrast, the `elastic` and `elasticsearch` services will automatically truncate the input to fit within the
143+
model's limit.
144+
::::
145+
139146
## {{infer-cap}} endpoint validation [infer-endpoint-validation]
140147

141148
The `inference_id` will not be validated when the mapping is created, but when
@@ -166,10 +173,49 @@ For more details on chunking and how to configure chunking settings,
166173
see [Configuring chunking](https://www.elastic.co/docs/api/doc/elasticsearch/group/endpoint-inference)
167174
in the Inference API documentation.
168175

176+
You can pre-chunk the input by sending it to Elasticsearch as an array of strings.
177+
Example:
178+
179+
```console
180+
PUT test-index
181+
{
182+
"mappings": {
183+
"properties": {
184+
"my_semantic_field": {
185+
"type": "semantic_text",
186+
"chunking_settings": {
187+
"strategy": "none" <1>
188+
}
189+
}
190+
}
191+
}
192+
}
193+
```
194+
195+
1. Disable chunking on `my_semantic_field`.
196+
197+
```console
198+
PUT test-index/_doc/1
199+
{
200+
"my_semantic_field": ["my first chunk", "my second chunk", ...] <1>
201+
...
202+
}
203+
```
204+
205+
1. The text is pre-chunked and provided as an array of strings.
206+
Each element in the array represents a single chunk that will be sent directly to the inference service without further chunking.
207+
208+
**Important considerations**:
209+
210+
* When providing pre-chunked input, ensure that you set the chunking strategy to `none` to avoid additional processing.
211+
* Each chunk should be sized carefully, staying within the token limit of the inference service and the underlying model.
212+
* If a chunk exceeds the model's token limit, the behavior depends on the service:
213+
* Some services (such as OpenAI) will return an error.
214+
* Others (such as `elastic` and `elasticsearch`) will automatically truncate the input.
215+
169216
Refer
170217
to [this tutorial](docs-content://solutions/search/semantic-search/semantic-search-semantic-text.md)
171-
to learn more about semantic search using `semantic_text` and the `semantic`
172-
query.
218+
to learn more about semantic search using `semantic_text`.
173219

174220
## Extracting Relevant Fragments from Semantic Text [semantic-text-highlighting]
175221

server/src/main/java/org/elasticsearch/TransportVersions.java

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -194,6 +194,7 @@ static TransportVersion def(int id) {
194194
public static final TransportVersion SEARCH_SOURCE_EXCLUDE_VECTORS_PARAM_8_19 = def(8_841_0_46);
195195
public static final TransportVersion ML_INFERENCE_MISTRAL_CHAT_COMPLETION_ADDED_8_19 = def(8_841_0_47);
196196
public static final TransportVersion ML_INFERENCE_ELASTIC_RERANK_ADDED_8_19 = def(8_841_0_48);
197+
public static final TransportVersion NONE_CHUNKING_STRATEGY_8_19 = def(8_841_0_49);
197198
public static final TransportVersion V_9_0_0 = def(9_000_0_09);
198199
public static final TransportVersion INITIAL_ELASTICSEARCH_9_0_1 = def(9_000_0_10);
199200
public static final TransportVersion INITIAL_ELASTICSEARCH_9_0_2 = def(9_000_0_11);
@@ -294,6 +295,7 @@ static TransportVersion def(int id) {
294295
public static final TransportVersion ML_INFERENCE_ELASTIC_RERANK = def(9_094_0_00);
295296
public static final TransportVersion SEARCH_LOAD_PER_INDEX_STATS = def(9_095_0_00);
296297
public static final TransportVersion HEAP_USAGE_IN_CLUSTER_INFO = def(9_096_0_00);
298+
public static final TransportVersion NONE_CHUNKING_STRATEGY = def(9_097_0_00);
297299

298300
/*
299301
* STOP! READ THIS FIRST! No, really,

server/src/main/java/org/elasticsearch/inference/ChunkingStrategy.java

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,8 @@
1515

1616
public enum ChunkingStrategy {
1717
WORD("word"),
18-
SENTENCE("sentence");
18+
SENTENCE("sentence"),
19+
NONE("none");
1920

2021
private final String chunkingStrategy;
2122

x-pack/plugin/inference/qa/test-service-plugin/src/main/java/org/elasticsearch/xpack/inference/mock/AbstractTestInferenceService.java

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@
2525
import org.elasticsearch.inference.TaskSettings;
2626
import org.elasticsearch.inference.TaskType;
2727
import org.elasticsearch.xcontent.XContentBuilder;
28+
import org.elasticsearch.xpack.inference.chunking.NoopChunker;
2829
import org.elasticsearch.xpack.inference.chunking.WordBoundaryChunker;
2930
import org.elasticsearch.xpack.inference.chunking.WordBoundaryChunkingSettings;
3031

@@ -126,7 +127,14 @@ protected List<ChunkedInput> chunkInputs(ChunkInferenceInput input) {
126127
}
127128

128129
List<ChunkedInput> chunkedInputs = new ArrayList<>();
129-
if (chunkingSettings.getChunkingStrategy() == ChunkingStrategy.WORD) {
130+
if (chunkingSettings.getChunkingStrategy() == ChunkingStrategy.NONE) {
131+
var offsets = NoopChunker.INSTANCE.chunk(input.input(), chunkingSettings);
132+
List<ChunkedInput> ret = new ArrayList<>();
133+
for (var offset : offsets) {
134+
ret.add(new ChunkedInput(inputText.substring(offset.start(), offset.end()), offset.start(), offset.end()));
135+
}
136+
return ret;
137+
} else if (chunkingSettings.getChunkingStrategy() == ChunkingStrategy.WORD) {
130138
WordBoundaryChunker chunker = new WordBoundaryChunker();
131139
WordBoundaryChunkingSettings wordBoundaryChunkingSettings = (WordBoundaryChunkingSettings) chunkingSettings;
132140
List<WordBoundaryChunker.ChunkOffset> offsets = chunker.chunk(

x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/InferenceNamedWriteablesProvider.java

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@
2626
import org.elasticsearch.xpack.core.inference.results.TextEmbeddingByteResults;
2727
import org.elasticsearch.xpack.core.inference.results.TextEmbeddingFloatResults;
2828
import org.elasticsearch.xpack.inference.action.task.StreamingTaskManager;
29+
import org.elasticsearch.xpack.inference.chunking.NoneChunkingSettings;
2930
import org.elasticsearch.xpack.inference.chunking.SentenceBoundaryChunkingSettings;
3031
import org.elasticsearch.xpack.inference.chunking.WordBoundaryChunkingSettings;
3132
import org.elasticsearch.xpack.inference.common.amazon.AwsSecretSettings;
@@ -553,6 +554,9 @@ private static void addInternalNamedWriteables(List<NamedWriteableRegistry.Entry
553554
}
554555

555556
private static void addChunkingSettingsNamedWriteables(List<NamedWriteableRegistry.Entry> namedWriteables) {
557+
namedWriteables.add(
558+
new NamedWriteableRegistry.Entry(ChunkingSettings.class, NoneChunkingSettings.NAME, in -> NoneChunkingSettings.INSTANCE)
559+
);
556560
namedWriteables.add(
557561
new NamedWriteableRegistry.Entry(ChunkingSettings.class, WordBoundaryChunkingSettings.NAME, WordBoundaryChunkingSettings::new)
558562
);

x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/chunking/ChunkerBuilder.java

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ public static Chunker fromChunkingStrategy(ChunkingStrategy chunkingStrategy) {
1616
}
1717

1818
return switch (chunkingStrategy) {
19+
case NONE -> NoopChunker.INSTANCE;
1920
case WORD -> new WordBoundaryChunker();
2021
case SENTENCE -> new SentenceBoundaryChunker();
2122
};

x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/chunking/ChunkingSettingsBuilder.java

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,7 @@ public static ChunkingSettings fromMap(Map<String, Object> settings, boolean ret
4545
settings.get(ChunkingSettingsOptions.STRATEGY.toString()).toString()
4646
);
4747
return switch (chunkingStrategy) {
48+
case NONE -> NoneChunkingSettings.INSTANCE;
4849
case WORD -> WordBoundaryChunkingSettings.fromMap(new HashMap<>(settings));
4950
case SENTENCE -> SentenceBoundaryChunkingSettings.fromMap(new HashMap<>(settings));
5051
};
Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
/*
2+
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
3+
* or more contributor license agreements. Licensed under the Elastic License
4+
* 2.0; you may not use this file except in compliance with the Elastic License
5+
* 2.0.
6+
*/
7+
8+
package org.elasticsearch.xpack.inference.chunking;
9+
10+
import org.elasticsearch.TransportVersion;
11+
import org.elasticsearch.TransportVersions;
12+
import org.elasticsearch.common.Strings;
13+
import org.elasticsearch.common.ValidationException;
14+
import org.elasticsearch.common.io.stream.StreamOutput;
15+
import org.elasticsearch.inference.ChunkingSettings;
16+
import org.elasticsearch.inference.ChunkingStrategy;
17+
import org.elasticsearch.xcontent.XContentBuilder;
18+
19+
import java.io.IOException;
20+
import java.util.Arrays;
21+
import java.util.Locale;
22+
import java.util.Map;
23+
import java.util.Objects;
24+
import java.util.Set;
25+
26+
public class NoneChunkingSettings implements ChunkingSettings {
27+
public static final String NAME = "NoneChunkingSettings";
28+
public static NoneChunkingSettings INSTANCE = new NoneChunkingSettings();
29+
30+
private static final ChunkingStrategy STRATEGY = ChunkingStrategy.NONE;
31+
private static final Set<String> VALID_KEYS = Set.of(ChunkingSettingsOptions.STRATEGY.toString());
32+
33+
private NoneChunkingSettings() {}
34+
35+
@Override
36+
public ChunkingStrategy getChunkingStrategy() {
37+
return STRATEGY;
38+
}
39+
40+
@Override
41+
public String getWriteableName() {
42+
return NAME;
43+
}
44+
45+
@Override
46+
public TransportVersion getMinimalSupportedVersion() {
47+
throw new IllegalStateException("not used");
48+
}
49+
50+
@Override
51+
public boolean supportsVersion(TransportVersion version) {
52+
return version.isPatchFrom(TransportVersions.NONE_CHUNKING_STRATEGY_8_19)
53+
|| version.onOrAfter(TransportVersions.NONE_CHUNKING_STRATEGY);
54+
}
55+
56+
@Override
57+
public void writeTo(StreamOutput out) throws IOException {}
58+
59+
@Override
60+
public Map<String, Object> asMap() {
61+
return Map.of(ChunkingSettingsOptions.STRATEGY.toString(), STRATEGY.toString().toLowerCase(Locale.ROOT));
62+
}
63+
64+
public static NoneChunkingSettings fromMap(Map<String, Object> map) {
65+
ValidationException validationException = new ValidationException();
66+
67+
var invalidSettings = map.keySet().stream().filter(key -> VALID_KEYS.contains(key) == false).toArray();
68+
if (invalidSettings.length > 0) {
69+
validationException.addValidationError(
70+
Strings.format(
71+
"When chunking is disabled (none), settings can not have the following: %s",
72+
Arrays.toString(invalidSettings)
73+
)
74+
);
75+
}
76+
77+
if (validationException.validationErrors().isEmpty() == false) {
78+
throw validationException;
79+
}
80+
81+
return NoneChunkingSettings.INSTANCE;
82+
}
83+
84+
@Override
85+
public XContentBuilder toXContent(XContentBuilder builder, Params params) throws IOException {
86+
builder.startObject();
87+
{
88+
builder.field(ChunkingSettingsOptions.STRATEGY.toString(), STRATEGY);
89+
}
90+
builder.endObject();
91+
return builder;
92+
}
93+
94+
@Override
95+
public boolean equals(Object o) {
96+
if (this == o) return true;
97+
if (o == null || getClass() != o.getClass()) return false;
98+
return true;
99+
}
100+
101+
@Override
102+
public int hashCode() {
103+
return Objects.hash(getClass());
104+
}
105+
106+
@Override
107+
public String toString() {
108+
return Strings.toString(this);
109+
}
110+
}
Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
/*
2+
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
3+
* or more contributor license agreements. Licensed under the Elastic License
4+
* 2.0; you may not use this file except in compliance with the Elastic License
5+
* 2.0.
6+
*/
7+
8+
package org.elasticsearch.xpack.inference.chunking;
9+
10+
import org.elasticsearch.common.Strings;
11+
import org.elasticsearch.inference.ChunkingSettings;
12+
import org.elasticsearch.xpack.inference.services.openai.embeddings.OpenAiEmbeddingsModel;
13+
14+
import java.util.List;
15+
16+
/**
17+
* A {@link Chunker} implementation that returns the input unchanged (no chunking is performed).
18+
*
19+
* <p><b>WARNING</b>If the input exceeds the maximum token limit, some services (such as {@link OpenAiEmbeddingsModel})
20+
* may return an error.
21+
* </p>
22+
*/
23+
public class NoopChunker implements Chunker {
24+
public static final NoopChunker INSTANCE = new NoopChunker();
25+
26+
private NoopChunker() {}
27+
28+
@Override
29+
public List<ChunkOffset> chunk(String input, ChunkingSettings chunkingSettings) {
30+
if (chunkingSettings instanceof NoneChunkingSettings) {
31+
return List.of(new ChunkOffset(0, input.length()));
32+
} else {
33+
throw new IllegalArgumentException(
34+
Strings.format("NoopChunker can't use ChunkingSettings with strategy [%s]", chunkingSettings.getChunkingStrategy())
35+
);
36+
}
37+
}
38+
}

0 commit comments

Comments
 (0)