You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: explore-analyze/elastic-inference/inference-api.md
+76-15Lines changed: 76 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -107,37 +107,98 @@ By default, documents are split into sentences and grouped in sections up to 250
107
107
108
108
Several strategies are available for chunking:
109
109
110
-
`sentence`
111
-
: The `sentence` strategy splits the input text at sentence boundaries. Each chunk contains one or more complete sentences ensuring that the integrity of sentence-level context is preserved, except when a sentence causes a chunk to exceed a word count of `max_chunk_size`, in which case it will be split across chunks. The `sentence_overlap` option defines the number of sentences from the previous chunk to include in the current chunk which is either `0` or `1`.
110
+
#### `sentence`
112
111
113
-
`word`
114
-
: The `word` strategy splits the input text on individual words up to the `max_chunk_size` limit. The `overlap` option is the number of words from the previous chunk to include in the current chunk.
112
+
The `sentence` strategy splits the input text at sentence boundaries. Each chunk contains one or more complete sentences ensuring that the integrity of sentence-level context is preserved, except when a sentence causes a chunk to exceed a word count of `max_chunk_size`, in which case it will be split across chunks. The `sentence_overlap` option defines the number of sentences from the previous chunk to include in the current chunk which is either `0` or `1`.
115
113
116
-
`recursive`{applies_to}`stack: ga 9.1`
117
-
: The `recursive` strategy splits the input text based on a configurable list of separator patterns (for example, newlines or Markdown headers). The chunker applies these separators in order, recursively splitting any chunk that exceeds the `max_chunk_size` word limit. If no separator produces a small enough chunk, the strategy falls back to sentence-level splitting.
114
+
The following example creates an {{infer}} endpoint with the `elasticsearch` service that deploys the ELSER model and configures the chunking behavior with the `sentence` strategy.
118
115
119
-
`none` {applies_to}`stack: ga 9.1`
120
-
121
-
: The `none` strategy disables chunking and processes the entire input text as a single block, without any splitting or overlap. When using this strategy, you can instead [pre-chunk](https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/semantic-text#auto-text-chunking) the input by providing an array of strings, where each element acts as a separate chunk to be sent directly to the inference service without further chunking.
116
+
```console
117
+
PUT _inference/sparse_embedding/sentence_chunks
118
+
{
119
+
"service": "elasticsearch",
120
+
"service_settings": {
121
+
"model_id": ".elser_model_2",
122
+
"num_allocations": 1,
123
+
"num_threads": 1
124
+
},
125
+
"chunking_settings": {
126
+
"strategy": "sentence",
127
+
"max_chunk_size": 100,
128
+
"sentence_overlap": 0
129
+
}
130
+
}
131
+
```
122
132
123
133
The default chunking strategy is `sentence`.
124
134
125
-
#### Example of configuring the chunking behavior
135
+
#### `word`
126
136
127
-
The following example creates an {{infer}} endpoint with the `elasticsearch` service that deploys the ELSER model by default and configures the chunking behavior.
137
+
The `word` strategy splits the input text on individual words up to the `max_chunk_size` limit. The `overlap` option is the number of words from the previous chunk to include in the current chunk.
138
+
139
+
The following example creates an {{infer}} endpoint with the `elasticsearch` service that deploys the ELSER model and configures the chunking behavior with the `word` strategy, setting a maximum of 120 words per chunk and an overlap of 40 words between chunks.
128
140
129
141
```console
130
-
PUT _inference/sparse_embedding/small_chunk_size
142
+
PUT _inference/sparse_embedding/word_chunks
131
143
{
132
144
"service": "elasticsearch",
133
145
"service_settings": {
146
+
"model_id": ".elser_model_2",
134
147
"num_allocations": 1,
135
148
"num_threads": 1
136
149
},
137
150
"chunking_settings": {
138
-
"strategy": "sentence",
139
-
"max_chunk_size": 100,
140
-
"sentence_overlap": 0
151
+
"strategy": "word",
152
+
"max_chunk_size": 120,
153
+
"overlap": 40
154
+
}
155
+
}
156
+
```
157
+
158
+
#### `recursive`
159
+
160
+
{applies_to}`stack: ga 9.1`
161
+
162
+
The `recursive` strategy splits the input text based on a configurable list of separator patterns (for example, newlines or Markdown headers). The chunker applies these separators in order, recursively splitting any chunk that exceeds the `max_chunk_size` word limit. If no separator produces a small enough chunk, the strategy falls back to sentence-level splitting.
163
+
164
+
The following example creates an {{infer}} endpoint with the `elasticsearch` service that deploys the ELSER model and configures chunking with the `recursive` strategy using the markdown separator group and a maximum of 200 words per chunk.
165
+
166
+
```console
167
+
PUT _inference/sparse_embedding/recursive_markdown_chunks
168
+
{
169
+
"service": "elasticsearch",
170
+
"service_settings": {
171
+
"model_id": ".elser_model_2",
172
+
"num_allocations": 1,
173
+
"num_threads": 1
174
+
},
175
+
"chunking_settings": {
176
+
"strategy": "recursive",
177
+
"max_chunk_size": 200,
178
+
"separator_group": "markdown"
179
+
}
180
+
}
181
+
```
182
+
183
+
#### `none`
184
+
185
+
{applies_to}`stack: ga 9.1`
186
+
187
+
The `none` strategy disables chunking and processes the entire input text as a single block, without any splitting or overlap. When using this strategy, you can instead [pre-chunk](https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/semantic-text#auto-text-chunking) the input by providing an array of strings, where each element acts as a separate chunk to be sent directly to the inference service without further chunking.
188
+
189
+
The following example creates an {{infer}} endpoint with the `elasticsearch` service that deploys the `ELSER` model and disables chunking by setting the strategy to `none`.
0 commit comments