Skip to content

Commit 1bc324d

Browse files
kosabogitheletterfszabosteve
authored
Adds chunking strategy examples to the Inference integrations page (#2946)
This PR extends the Inference integrations page with examples for all chunking strategies (`sentence`, `word`, `recursive`, and `none`). Related issue: elastic/docs-content-internal#273 --------- Co-authored-by: Fabrizio Ferri-Benedetti <[email protected]> Co-authored-by: István Zoltán Szabó <[email protected]>
1 parent 04ab734 commit 1bc324d

File tree

1 file changed

+110
-15
lines changed

1 file changed

+110
-15
lines changed

explore-analyze/elastic-inference/inference-api.md

Lines changed: 110 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -107,37 +107,132 @@ By default, documents are split into sentences and grouped in sections up to 250
107107

108108
Several strategies are available for chunking:
109109

110-
`sentence`
111-
: The `sentence` strategy splits the input text at sentence boundaries. Each chunk contains one or more complete sentences ensuring that the integrity of sentence-level context is preserved, except when a sentence causes a chunk to exceed a word count of `max_chunk_size`, in which case it will be split across chunks. The `sentence_overlap` option defines the number of sentences from the previous chunk to include in the current chunk which is either `0` or `1`.
110+
#### `sentence`
112111

113-
`word`
114-
: The `word` strategy splits the input text on individual words up to the `max_chunk_size` limit. The `overlap` option is the number of words from the previous chunk to include in the current chunk.
112+
The `sentence` strategy splits the input text at sentence boundaries. Each chunk contains one or more complete sentences ensuring that the integrity of sentence-level context is preserved, except when a sentence causes a chunk to exceed a word count of `max_chunk_size`, in which case it will be split across chunks. The `sentence_overlap` option defines the number of sentences from the previous chunk to include in the current chunk which is either `0` or `1`.
115113

116-
`recursive`{applies_to}`stack: ga 9.1`
117-
: The `recursive` strategy splits the input text based on a configurable list of separator patterns (for example, newlines or Markdown headers). The chunker applies these separators in order, recursively splitting any chunk that exceeds the `max_chunk_size` word limit. If no separator produces a small enough chunk, the strategy falls back to sentence-level splitting.
114+
The following example creates an {{infer}} endpoint with the `elasticsearch` service that deploys the ELSER model and configures the chunking behavior with the `sentence` strategy.
118115

119-
`none` {applies_to}`stack: ga 9.1`
120-
121-
: The `none` strategy disables chunking and processes the entire input text as a single block, without any splitting or overlap. When using this strategy, you can instead [pre-chunk](https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/semantic-text#auto-text-chunking) the input by providing an array of strings, where each element acts as a separate chunk to be sent directly to the inference service without further chunking.
116+
```console
117+
PUT _inference/sparse_embedding/sentence_chunks
118+
{
119+
"service": "elasticsearch",
120+
"service_settings": {
121+
"model_id": ".elser_model_2",
122+
"num_allocations": 1,
123+
"num_threads": 1
124+
},
125+
"chunking_settings": {
126+
"strategy": "sentence",
127+
"max_chunk_size": 100,
128+
"sentence_overlap": 0
129+
}
130+
}
131+
```
122132

123133
The default chunking strategy is `sentence`.
124134

125-
#### Example of configuring the chunking behavior
135+
#### `word`
136+
137+
The `word` strategy splits the input text on individual words up to the `max_chunk_size` limit. The `overlap` option is the number of words from the previous chunk to include in the current chunk.
126138

127-
The following example creates an {{infer}} endpoint with the `elasticsearch` service that deploys the ELSER model by default and configures the chunking behavior.
139+
The following example creates an {{infer}} endpoint with the `elasticsearch` service that deploys the ELSER model and configures the chunking behavior with the `word` strategy, setting a maximum of 120 words per chunk and an overlap of 40 words between chunks.
128140

129141
```console
130-
PUT _inference/sparse_embedding/small_chunk_size
142+
PUT _inference/sparse_embedding/word_chunks
131143
{
132144
"service": "elasticsearch",
133145
"service_settings": {
146+
"model_id": ".elser_model_2",
134147
"num_allocations": 1,
135148
"num_threads": 1
136149
},
137150
"chunking_settings": {
138-
"strategy": "sentence",
139-
"max_chunk_size": 100,
140-
"sentence_overlap": 0
151+
"strategy": "word",
152+
"max_chunk_size": 120,
153+
"overlap": 40
154+
}
155+
}
156+
```
157+
158+
#### `recursive`
159+
160+
```{applies_to}
161+
stack: ga 9.1`
162+
```
163+
164+
The `recursive` strategy splits the input text based on a configurable list of separator patterns (for example, newlines or Markdown headers). The chunker applies these separators in order, recursively splitting any chunk that exceeds the `max_chunk_size` word limit. If no separator produces a small enough chunk, the strategy falls back to sentence-level splitting.
165+
166+
##### Markdown separator group
167+
168+
The following example creates an {{infer}} endpoint with the `elasticsearch` service that deploys the ELSER model and configures chunking with the `recursive` strategy using the markdown separator group and a maximum of 200 words per chunk.
169+
170+
```console
171+
PUT _inference/sparse_embedding/recursive_markdown_chunks
172+
{
173+
"service": "elasticsearch",
174+
"service_settings": {
175+
"model_id": ".elser_model_2",
176+
"num_allocations": 1,
177+
"num_threads": 1
178+
},
179+
"chunking_settings": {
180+
"strategy": "recursive",
181+
"max_chunk_size": 200,
182+
"separator_group": "markdown"
183+
}
184+
}
185+
```
186+
187+
##### Custom separator group
188+
189+
The following example creates an {{infer}} endpoint with the `elasticsearch` service that deploys the ELSER model and configures chunking with the `recursive` strategy. It uses a custom list of separators to split plaintext into chunks of up to 180 words.
190+
191+
192+
```console
193+
PUT _inference/sparse_embedding/recursive_custom_chunks
194+
{
195+
"service": "elasticsearch",
196+
"service_settings": {
197+
"model_id": ".elser_model_2",
198+
"num_allocations": 1,
199+
"num_threads": 1
200+
},
201+
"chunking_settings": {
202+
"strategy": "recursive",
203+
"max_chunk_size": 180,
204+
"separators": [
205+
"^(#{1,6})\\s",
206+
"\\n\\n",
207+
"\\n[-*]\\s",
208+
"\\n\\d+\\.\\s",
209+
"\\n"
210+
]
211+
}
212+
}
213+
```
214+
215+
#### `none`
216+
217+
```{applies_to}
218+
stack: ga 9.1`
219+
```
220+
221+
The `none` strategy disables chunking and processes the entire input text as a single block, without any splitting or overlap. When using this strategy, you can instead [pre-chunk](https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/semantic-text#auto-text-chunking) the input by providing an array of strings, where each element acts as a separate chunk to be sent directly to the inference service without further chunking.
222+
223+
The following example creates an {{infer}} endpoint with the `elasticsearch` service that deploys the ELSER model and disables chunking by setting the strategy to `none`.
224+
225+
```console
226+
PUT _inference/sparse_embedding/none_chunking
227+
{
228+
"service": "elasticsearch",
229+
"service_settings": {
230+
"model_id": ".elser_model_2",
231+
"num_allocations": 1,
232+
"num_threads": 1
233+
},
234+
"chunking_settings": {
235+
"strategy": "none"
141236
}
142237
}
143238
```

0 commit comments

Comments
 (0)