Skip to content

Commit 0b4a1d3

Browse files
committed
Add none chunking strategy to disable automatic chunking for inference endpoints
This introduces a `none` chunking strategy that disables automatic chunking when using an inference endpoint. It enables users to provide pre-chunked input directly to a `semantic_text` field without any additional splitting. The chunking strategy can be configured either on the inference endpoint or directly in the `semantic_text` field definition. **Example:** ```json PUT test-index { "mappings": { "properties": { "my_semantic_field": { "type": "semantic_text", "chunking_settings": { "strategy": "none" <1> } } } } } ``` <1> Disables automatic chunking on `my_semantic_field`. ```json PUT test-index/_doc/1 { "my_semantic_field": ["my first chunk", "my second chunk", ...] <1> ... } ``` <1> Pre-chunked input provided as an array of strings. Each array element represents a single chunk that will be sent directly to the inference service without further processing.
1 parent 6e67fac commit 0b4a1d3

File tree

17 files changed

+370
-15
lines changed

17 files changed

+370
-15
lines changed

docs/reference/elasticsearch/mapping-reference/semantic-text.md

Lines changed: 50 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -117,15 +117,16 @@ If specified, these will override the chunking settings set in the {{infer-cap}}
117117
endpoint associated with `inference_id`.
118118
If chunking settings are updated, they will not be applied to existing documents
119119
until they are reindexed.
120+
To completely disable chunking, use the `none` chunking strategy.
120121

121122
**Valid values for `chunking_settings`**:
122123

123124
`type`
124-
: Indicates the type of chunking strategy to use. Valid values are `word` or
125+
: Indicates the type of chunking strategy to use. Valid values are `none`, `word` or
125126
`sentence`. Required.
126127

127128
`max_chunk_size`
128-
: The maximum number of works in a chunk. Required.
129+
: The maximum number of works in a chunk. Required for `word` and `sentence` strategies.
129130

130131
`overlap`
131132
: The number of overlapping words allowed in chunks. This cannot be defined as
@@ -136,6 +137,12 @@ until they are reindexed.
136137
: The number of overlapping sentences allowed in chunks. Valid values are `0`
137138
or `1`. Required for `sentence` type chunking settings
138139

140+
::::{warning}
141+
When using the `none` chunking strategy, if the input exceeds the maximum token limit of the underlying model,
142+
some services (such as OpenAI) may return an error. In contrast, the elastic and elasticsearch services
143+
will automatically truncate the input to fit within the model's limit.
144+
::::
145+
139146
## {{infer-cap}} endpoint validation [infer-endpoint-validation]
140147

141148
The `inference_id` will not be validated when the mapping is created, but when
@@ -166,10 +173,49 @@ For more details on chunking and how to configure chunking settings,
166173
see [Configuring chunking](https://www.elastic.co/docs/api/doc/elasticsearch/group/endpoint-inference)
167174
in the Inference API documentation.
168175

176+
You can pre-chunk the input by sending it to Elasticsearch as an array of strings.
177+
Example:
178+
179+
```console
180+
PUT test-index
181+
{
182+
"mappings": {
183+
"properties": {
184+
"my_semantic_field": {
185+
"type": "semantic_text",
186+
"chunking_settings": {
187+
"strategy": "none" <1>
188+
}
189+
}
190+
}
191+
}
192+
}
193+
```
194+
195+
1. Disable chunking on `my_semantic_field`.
196+
197+
```console
198+
PUT test-index/_doc/1
199+
{
200+
"my_semantic_field": ["my first chunk", "my second chunk", ...] <1>
201+
...
202+
}
203+
```
204+
205+
1. The text is pre-chunked and provided as an array of strings.
206+
Each element in the array represents a single chunk that will be sent directly to the inference service without further chunking.
207+
208+
**Important considerations**:
209+
210+
* When providing pre-chunked input, ensure that you set the chunking strategy to `none` to avoid additional processing.
211+
* Each chunk should be sized carefully, staying within the token limit of the inference service and the underlying model.
212+
* If a chunk exceeds the model's token limit, the behavior depends on the service:
213+
* Some services (such as OpenAI) will return an error.
214+
* Others (such as `elastic` and `elasticsearch`) will automatically truncate the input.
215+
169216
Refer
170217
to [this tutorial](docs-content://solutions/search/semantic-search/semantic-search-semantic-text.md)
171-
to learn more about semantic search using `semantic_text` and the `semantic`
172-
query.
218+
to learn more about semantic search using `semantic_text`.
173219

174220
## Extracting Relevant Fragments from Semantic Text [semantic-text-highlighting]
175221

server/src/main/java/org/elasticsearch/TransportVersions.java

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -288,6 +288,7 @@ static TransportVersion def(int id) {
288288
public static final TransportVersion ML_INFERENCE_MISTRAL_CHAT_COMPLETION_ADDED = def(9_090_0_00);
289289
public static final TransportVersion IDP_CUSTOM_SAML_ATTRIBUTES_ALLOW_LIST = def(9_091_0_00);
290290
public static final TransportVersion SEARCH_SOURCE_EXCLUDE_VECTORS_PARAM = def(9_092_0_00);
291+
public static final TransportVersion NONE_CHUNKING_STRATEGY = def(9_093_0_00);
291292
/*
292293
* STOP! READ THIS FIRST! No, really,
293294
* ____ _____ ___ ____ _ ____ _____ _ ____ _____ _ _ ___ ____ _____ ___ ____ ____ _____ _

server/src/main/java/org/elasticsearch/inference/ChunkingStrategy.java

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,8 @@
1515

1616
public enum ChunkingStrategy {
1717
WORD("word"),
18-
SENTENCE("sentence");
18+
SENTENCE("sentence"),
19+
NONE("none");
1920

2021
private final String chunkingStrategy;
2122

x-pack/plugin/inference/qa/test-service-plugin/src/main/java/org/elasticsearch/xpack/inference/mock/AbstractTestInferenceService.java

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -126,7 +126,9 @@ protected List<ChunkedInput> chunkInputs(ChunkInferenceInput input) {
126126
}
127127

128128
List<ChunkedInput> chunkedInputs = new ArrayList<>();
129-
if (chunkingSettings.getChunkingStrategy() == ChunkingStrategy.WORD) {
129+
if (chunkingSettings.getChunkingStrategy() == ChunkingStrategy.NONE) {
130+
return List.of(new ChunkedInput(inputText, 0, inputText.length()));
131+
} else if (chunkingSettings.getChunkingStrategy() == ChunkingStrategy.WORD) {
130132
WordBoundaryChunker chunker = new WordBoundaryChunker();
131133
WordBoundaryChunkingSettings wordBoundaryChunkingSettings = (WordBoundaryChunkingSettings) chunkingSettings;
132134
List<WordBoundaryChunker.ChunkOffset> offsets = chunker.chunk(

x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/InferenceNamedWriteablesProvider.java

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@
2626
import org.elasticsearch.xpack.core.inference.results.TextEmbeddingByteResults;
2727
import org.elasticsearch.xpack.core.inference.results.TextEmbeddingFloatResults;
2828
import org.elasticsearch.xpack.inference.action.task.StreamingTaskManager;
29+
import org.elasticsearch.xpack.inference.chunking.NoneChunkingSettings;
2930
import org.elasticsearch.xpack.inference.chunking.SentenceBoundaryChunkingSettings;
3031
import org.elasticsearch.xpack.inference.chunking.WordBoundaryChunkingSettings;
3132
import org.elasticsearch.xpack.inference.common.amazon.AwsSecretSettings;
@@ -552,6 +553,9 @@ private static void addInternalNamedWriteables(List<NamedWriteableRegistry.Entry
552553
}
553554

554555
private static void addChunkingSettingsNamedWriteables(List<NamedWriteableRegistry.Entry> namedWriteables) {
556+
namedWriteables.add(
557+
new NamedWriteableRegistry.Entry(ChunkingSettings.class, NoneChunkingSettings.NAME, in -> NoneChunkingSettings.INSTANCE)
558+
);
555559
namedWriteables.add(
556560
new NamedWriteableRegistry.Entry(ChunkingSettings.class, WordBoundaryChunkingSettings.NAME, WordBoundaryChunkingSettings::new)
557561
);

x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/chunking/ChunkerBuilder.java

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ public static Chunker fromChunkingStrategy(ChunkingStrategy chunkingStrategy) {
1616
}
1717

1818
return switch (chunkingStrategy) {
19+
case NONE -> NoopChunker.INSTANCE;
1920
case WORD -> new WordBoundaryChunker();
2021
case SENTENCE -> new SentenceBoundaryChunker();
2122
};

x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/chunking/ChunkingSettingsBuilder.java

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,7 @@ public static ChunkingSettings fromMap(Map<String, Object> settings, boolean ret
4545
settings.get(ChunkingSettingsOptions.STRATEGY.toString()).toString()
4646
);
4747
return switch (chunkingStrategy) {
48+
case NONE -> NoneChunkingSettings.INSTANCE;
4849
case WORD -> WordBoundaryChunkingSettings.fromMap(new HashMap<>(settings));
4950
case SENTENCE -> SentenceBoundaryChunkingSettings.fromMap(new HashMap<>(settings));
5051
};
Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
/*
2+
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
3+
* or more contributor license agreements. Licensed under the Elastic License
4+
* 2.0; you may not use this file except in compliance with the Elastic License
5+
* 2.0.
6+
*/
7+
8+
package org.elasticsearch.xpack.inference.chunking;
9+
10+
import org.elasticsearch.TransportVersion;
11+
import org.elasticsearch.TransportVersions;
12+
import org.elasticsearch.common.Strings;
13+
import org.elasticsearch.common.ValidationException;
14+
import org.elasticsearch.common.io.stream.StreamInput;
15+
import org.elasticsearch.common.io.stream.StreamOutput;
16+
import org.elasticsearch.inference.ChunkingSettings;
17+
import org.elasticsearch.inference.ChunkingStrategy;
18+
import org.elasticsearch.xcontent.XContentBuilder;
19+
20+
import java.io.IOException;
21+
import java.util.Arrays;
22+
import java.util.Locale;
23+
import java.util.Map;
24+
import java.util.Objects;
25+
import java.util.Set;
26+
27+
public class NoneChunkingSettings implements ChunkingSettings {
28+
public static final String NAME = "NoneChunkingSettings";
29+
public static NoneChunkingSettings INSTANCE = new NoneChunkingSettings();
30+
31+
private static final ChunkingStrategy STRATEGY = ChunkingStrategy.NONE;
32+
private static final Set<String> VALID_KEYS = Set.of(ChunkingSettingsOptions.STRATEGY.toString());
33+
34+
private NoneChunkingSettings() {}
35+
36+
public NoneChunkingSettings(StreamInput in) throws IOException {}
37+
38+
@Override
39+
public ChunkingStrategy getChunkingStrategy() {
40+
return STRATEGY;
41+
}
42+
43+
@Override
44+
public String getWriteableName() {
45+
return NAME;
46+
}
47+
48+
@Override
49+
public TransportVersion getMinimalSupportedVersion() {
50+
return TransportVersions.NONE_CHUNKING_STRATEGY;
51+
}
52+
53+
@Override
54+
public void writeTo(StreamOutput out) throws IOException {}
55+
56+
@Override
57+
public Map<String, Object> asMap() {
58+
return Map.of(ChunkingSettingsOptions.STRATEGY.toString(), STRATEGY.toString().toLowerCase(Locale.ROOT));
59+
}
60+
61+
public static NoneChunkingSettings fromMap(Map<String, Object> map) {
62+
ValidationException validationException = new ValidationException();
63+
64+
var invalidSettings = map.keySet().stream().filter(key -> VALID_KEYS.contains(key) == false).toArray();
65+
if (invalidSettings.length > 0) {
66+
validationException.addValidationError(
67+
Strings.format("Sentence based chunking settings can not have the following settings: %s", Arrays.toString(invalidSettings))
68+
);
69+
}
70+
71+
if (validationException.validationErrors().isEmpty() == false) {
72+
throw validationException;
73+
}
74+
75+
return new NoneChunkingSettings();
76+
}
77+
78+
@Override
79+
public XContentBuilder toXContent(XContentBuilder builder, Params params) throws IOException {
80+
builder.startObject();
81+
{
82+
builder.field(ChunkingSettingsOptions.STRATEGY.toString(), STRATEGY);
83+
}
84+
builder.endObject();
85+
return builder;
86+
}
87+
88+
@Override
89+
public boolean equals(Object o) {
90+
if (this == o) return true;
91+
if (o == null || getClass() != o.getClass()) return false;
92+
return true;
93+
}
94+
95+
@Override
96+
public int hashCode() {
97+
return Objects.hash(getClass());
98+
}
99+
100+
@Override
101+
public String toString() {
102+
return Strings.toString(this);
103+
}
104+
}
Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
/*
2+
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
3+
* or more contributor license agreements. Licensed under the Elastic License
4+
* 2.0; you may not use this file except in compliance with the Elastic License
5+
* 2.0.
6+
*/
7+
8+
package org.elasticsearch.xpack.inference.chunking;
9+
10+
import org.elasticsearch.common.Strings;
11+
import org.elasticsearch.inference.ChunkingSettings;
12+
import org.elasticsearch.xpack.inference.services.openai.embeddings.OpenAiEmbeddingsModel;
13+
14+
import java.util.List;
15+
16+
/**
17+
* A {@link Chunker} implementation that returns the input unchanged (no chunking is performed).
18+
*
19+
* <p><b>WARNING</b>If the input exceeds the maximum token limit, some services (such as {@link OpenAiEmbeddingsModel})
20+
* may return an error.
21+
* </p>
22+
*/
23+
public class NoopChunker implements Chunker {
24+
static final NoopChunker INSTANCE = new NoopChunker();
25+
26+
private NoopChunker() {}
27+
28+
@Override
29+
public List<ChunkOffset> chunk(String input, ChunkingSettings chunkingSettings) {
30+
if (chunkingSettings instanceof NoneChunkingSettings) {
31+
return List.of(new ChunkOffset(0, input.length()));
32+
} else {
33+
throw new IllegalArgumentException(
34+
Strings.format("NoopChunker can't use ChunkingSettings with strategy [%s]", chunkingSettings.getChunkingStrategy())
35+
);
36+
}
37+
}
38+
}

x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/chunking/ChunkerBuilderTests.java

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,13 @@ public void testValidChunkingStrategy() {
2727
}
2828

2929
private Map<ChunkingStrategy, Class<? extends Chunker>> chunkingStrategyToExpectedChunkerClassMap() {
30-
return Map.of(ChunkingStrategy.WORD, WordBoundaryChunker.class, ChunkingStrategy.SENTENCE, SentenceBoundaryChunker.class);
30+
return Map.of(
31+
ChunkingStrategy.NONE,
32+
NoopChunker.class,
33+
ChunkingStrategy.WORD,
34+
WordBoundaryChunker.class,
35+
ChunkingStrategy.SENTENCE,
36+
SentenceBoundaryChunker.class
37+
);
3138
}
3239
}

0 commit comments

Comments
 (0)