You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -107,37 +107,132 @@ By default, documents are split into sentences and grouped in sections up to 250
107
107
108
108
Several strategies are available for chunking:
109
109
110
-
`sentence`
111
-
: The `sentence` strategy splits the input text at sentence boundaries. Each chunk contains one or more complete sentences ensuring that the integrity of sentence-level context is preserved, except when a sentence causes a chunk to exceed a word count of `max_chunk_size`, in which case it will be split across chunks. The `sentence_overlap` option defines the number of sentences from the previous chunk to include in the current chunk which is either `0` or `1`.
110
+
#### `sentence`
112
111
113
-
`word`
114
-
: The `word` strategy splits the input text on individual words up to the `max_chunk_size` limit. The `overlap` option is the number of words from the previous chunk to include in the current chunk.
112
+
The `sentence` strategy splits the input text at sentence boundaries. Each chunk contains one or more complete sentences ensuring that the integrity of sentence-level context is preserved, except when a sentence causes a chunk to exceed a word count of `max_chunk_size`, in which case it will be split across chunks. The `sentence_overlap` option defines the number of sentences from the previous chunk to include in the current chunk which is either `0` or `1`.
115
113
116
-
`recursive`{applies_to}`stack: ga 9.1`
117
-
: The `recursive` strategy splits the input text based on a configurable list of separator patterns (for example, newlines or Markdown headers). The chunker applies these separators in order, recursively splitting any chunk that exceeds the `max_chunk_size` word limit. If no separator produces a small enough chunk, the strategy falls back to sentence-level splitting.
114
+
The following example creates an {{infer}} endpoint with the `elasticsearch` service that deploys the ELSER model and configures the chunking behavior with the `sentence` strategy.
118
115
119
-
`none` {applies_to}`stack: ga 9.1`
120
-
121
-
: The `none` strategy disables chunking and processes the entire input text as a single block, without any splitting or overlap. When using this strategy, you can instead [pre-chunk](https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/semantic-text#auto-text-chunking) the input by providing an array of strings, where each element acts as a separate chunk to be sent directly to the inference service without further chunking.
116
+
```console
117
+
PUT _inference/sparse_embedding/sentence_chunks
118
+
{
119
+
"service": "elasticsearch",
120
+
"service_settings": {
121
+
"model_id": ".elser_model_2",
122
+
"num_allocations": 1,
123
+
"num_threads": 1
124
+
},
125
+
"chunking_settings": {
126
+
"strategy": "sentence",
127
+
"max_chunk_size": 100,
128
+
"sentence_overlap": 0
129
+
}
130
+
}
131
+
```
122
132
123
133
The default chunking strategy is `sentence`.
124
134
125
-
#### Example of configuring the chunking behavior
135
+
#### `word`
136
+
137
+
The `word` strategy splits the input text on individual words up to the `max_chunk_size` limit. The `overlap` option is the number of words from the previous chunk to include in the current chunk.
126
138
127
-
The following example creates an {{infer}} endpoint with the `elasticsearch` service that deploys the ELSER model by default and configures the chunking behavior.
139
+
The following example creates an {{infer}} endpoint with the `elasticsearch` service that deploys the ELSER model and configures the chunking behavior with the `word` strategy, setting a maximum of 120 words per chunk and an overlap of 40 words between chunks.
128
140
129
141
```console
130
-
PUT _inference/sparse_embedding/small_chunk_size
142
+
PUT _inference/sparse_embedding/word_chunks
131
143
{
132
144
"service": "elasticsearch",
133
145
"service_settings": {
146
+
"model_id": ".elser_model_2",
134
147
"num_allocations": 1,
135
148
"num_threads": 1
136
149
},
137
150
"chunking_settings": {
138
-
"strategy": "sentence",
139
-
"max_chunk_size": 100,
140
-
"sentence_overlap": 0
151
+
"strategy": "word",
152
+
"max_chunk_size": 120,
153
+
"overlap": 40
154
+
}
155
+
}
156
+
```
157
+
158
+
#### `recursive`
159
+
160
+
```{applies_to}
161
+
stack: ga 9.1`
162
+
```
163
+
164
+
The `recursive` strategy splits the input text based on a configurable list of separator patterns (for example, newlines or Markdown headers). The chunker applies these separators in order, recursively splitting any chunk that exceeds the `max_chunk_size` word limit. If no separator produces a small enough chunk, the strategy falls back to sentence-level splitting.
165
+
166
+
##### Markdown separator group
167
+
168
+
The following example creates an {{infer}} endpoint with the `elasticsearch` service that deploys the ELSER model and configures chunking with the `recursive` strategy using the markdown separator group and a maximum of 200 words per chunk.
169
+
170
+
```console
171
+
PUT _inference/sparse_embedding/recursive_markdown_chunks
172
+
{
173
+
"service": "elasticsearch",
174
+
"service_settings": {
175
+
"model_id": ".elser_model_2",
176
+
"num_allocations": 1,
177
+
"num_threads": 1
178
+
},
179
+
"chunking_settings": {
180
+
"strategy": "recursive",
181
+
"max_chunk_size": 200,
182
+
"separator_group": "markdown"
183
+
}
184
+
}
185
+
```
186
+
187
+
##### Custom separator group
188
+
189
+
The following example creates an {{infer}} endpoint with the `elasticsearch` service that deploys the ELSER model and configures chunking with the `recursive` strategy. It uses a custom list of separators to split plaintext into chunks of up to 180 words.
190
+
191
+
192
+
```console
193
+
PUT _inference/sparse_embedding/recursive_custom_chunks
194
+
{
195
+
"service": "elasticsearch",
196
+
"service_settings": {
197
+
"model_id": ".elser_model_2",
198
+
"num_allocations": 1,
199
+
"num_threads": 1
200
+
},
201
+
"chunking_settings": {
202
+
"strategy": "recursive",
203
+
"max_chunk_size": 180,
204
+
"separators": [
205
+
"^(#{1,6})\\s",
206
+
"\\n\\n",
207
+
"\\n[-*]\\s",
208
+
"\\n\\d+\\.\\s",
209
+
"\\n"
210
+
]
211
+
}
212
+
}
213
+
```
214
+
215
+
#### `none`
216
+
217
+
```{applies_to}
218
+
stack: ga 9.1`
219
+
```
220
+
221
+
The `none` strategy disables chunking and processes the entire input text as a single block, without any splitting or overlap. When using this strategy, you can instead [pre-chunk](https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/semantic-text#auto-text-chunking) the input by providing an array of strings, where each element acts as a separate chunk to be sent directly to the inference service without further chunking.
222
+
223
+
The following example creates an {{infer}} endpoint with the `elasticsearch` service that deploys the ELSER model and disables chunking by setting the strategy to `none`.
*[Index privileges](/deploy-manage/users-roles/cluster-or-deployment-auth/elasticsearch-privileges.md#privileges-list-indices): `create_doc` and `create_index`
*[Index privileges](elasticsearch://reference/elasticsearch/security-privileges.md#privileges-list-indices): `create_doc` and `create_index`
23
23
24
24
* Familiarity with [time series data stream concepts](time-series-data-stream-tsds.md) and [{{es}} index and search basics](/solutions/search/get-started.md)
Copy file name to clipboardExpand all lines: solutions/observability/connect-to-own-local-llm.md
+4Lines changed: 4 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,6 +19,10 @@ If your Elastic deployment is not on the same network, you must configure an Ngi
19
19
You do not have to set up a proxy if LM Studio is running locally, or on the same network as your Elastic deployment.
20
20
::::
21
21
22
+
::::{note}
23
+
For information about the performance of open-source models on {{obs-ai-assistant}} tasks, refer to the [LLM performance matrix](/solutions/observability/llm-performance-matrix.md).
24
+
::::
25
+
22
26
This example uses a server hosted in GCP to configure LM Studio with the [Llama-3.3-70B-Instruct](https://huggingface.co/lmstudio-community/Llama-3.3-70B-Instruct-GGUF) model.
This page summarizes internal test results comparing large language models (LLMs) across {{obs-ai-assistant}} use cases. To learn more about these use cases, refer to [AI Assistant](/solutions/observability/observability-ai-assistant.md).
14
+
15
+
::::{important}
16
+
Rating legend:
17
+
18
+
**Excellent:** Highly accurate and reliable for the use case.<br>
19
+
**Great:** Strong performance with minor limitations.<br>
20
+
**Good:** Possibly adequate for many use cases but with noticeable tradeoffs.<br>
21
+
**Poor:** Significant issues; not recommended for production for the use case.
22
+
23
+
Recommended models are those rated **Excellent** or **Great** for the particular use case.
| Meta |**Llama-3.3-70B-Instruct**| Excellent | Good | Great | Excellent | Excellent | Good | Good | Excellent |
53
+
| Mistral |**Mistral-Small-3.2-24B-Instruct-2506**| Excellent | Poor | Great | Great | Excellent | Poor | Good | Excellent |
54
+
55
+
::::{note}
56
+
`Llama-3.3-70B-Instruct` is supported with simulated function calling.
57
+
::::
58
+
59
+
## Evaluate your own model
60
+
61
+
You can run the {{obs-ai-assistant}} evaluation framework against any model, and use it to benchmark a custom or self-hosted model against the use cases in the matrix. Refer to the [evaluation framework README](https://github.com/elastic/kibana/blob/main/x-pack/solutions/observability/plugins/observability_ai_assistant_app/scripts/evaluation/README.md) for setup and usage details.
62
+
63
+
For consistency, all ratings in this matrix were generated using `Gemini 2.5 Pro` as the judge model (specified via the `--evaluateWith` flag). Use the same judge when evaluating your own model to ensure comparable results.
Copy file name to clipboardExpand all lines: solutions/observability/observability-ai-assistant.md
+5Lines changed: 5 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -91,6 +91,11 @@ The AI Assistant connects to one of these supported LLM providers:
91
91
- The provider's API endpoint URL
92
92
- Your authentication key or secret
93
93
94
+
::::{admonition} Recommended models
95
+
While the {{obs-ai-assistant}} is compatible with many different models, refer to the [Large language model performance matrix](/solutions/observability/llm-performance-matrix.md) to select models that perform well with your desired use cases.
0 commit comments