Skip to content

Commit 68693a6

Browse files
authored
Merge branch 'main' into szabosteve/mark-elser-eis-ga
2 parents 68d3b99 + 1bc324d commit 68693a6

File tree

25 files changed

+1026
-436
lines changed

25 files changed

+1026
-436
lines changed

.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,10 @@
22
.artifacts
33
.DS_store
44

5+
# Jetbrains files
6+
.idea
7+
*.iml
8+
59
# Add LLM/AI related files
610
AGENTS.md
711
.github/copilot-instructions.md

deploy-manage/toc.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -609,7 +609,8 @@ toc:
609609
- file: users-roles/cluster-or-deployment-auth/granting-privileges-for-data-streams-aliases.md
610610
- file: users-roles/cluster-or-deployment-auth/kibana-role-management.md
611611
- file: users-roles/cluster-or-deployment-auth/role-restriction.md
612-
- file: users-roles/cluster-or-deployment-auth/elasticsearch-privileges.md
612+
- title: "Elasticsearch privileges"
613+
crosslink: elasticsearch://reference/elasticsearch/security-privileges.md
613614
- file: users-roles/cluster-or-deployment-auth/kibana-privileges.md
614615
- file: users-roles/cluster-or-deployment-auth/mapping-users-groups-to-roles.md
615616
children:

deploy-manage/users-roles/cluster-or-deployment-auth/elasticsearch-privileges.md

Lines changed: 0 additions & 416 deletions
This file was deleted.

docset.yml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -294,4 +294,6 @@ subs:
294294
ece-apis: https://www.elastic.co/docs/api/doc/cloud-enterprise/
295295
intake-apis: https://www.elastic.co/docs/api/doc/observability-serverless/
296296
models-app: "Trained Models"
297-
kube-stack-version: 0.6.3
297+
agent-builder: "Elastic Agent Builder"
298+
kube-stack-version: 0.6.3
299+

explore-analyze/elastic-inference/inference-api.md

Lines changed: 110 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -107,37 +107,132 @@ By default, documents are split into sentences and grouped in sections up to 250
107107

108108
Several strategies are available for chunking:
109109

110-
`sentence`
111-
: The `sentence` strategy splits the input text at sentence boundaries. Each chunk contains one or more complete sentences ensuring that the integrity of sentence-level context is preserved, except when a sentence causes a chunk to exceed a word count of `max_chunk_size`, in which case it will be split across chunks. The `sentence_overlap` option defines the number of sentences from the previous chunk to include in the current chunk which is either `0` or `1`.
110+
#### `sentence`
112111

113-
`word`
114-
: The `word` strategy splits the input text on individual words up to the `max_chunk_size` limit. The `overlap` option is the number of words from the previous chunk to include in the current chunk.
112+
The `sentence` strategy splits the input text at sentence boundaries. Each chunk contains one or more complete sentences ensuring that the integrity of sentence-level context is preserved, except when a sentence causes a chunk to exceed a word count of `max_chunk_size`, in which case it will be split across chunks. The `sentence_overlap` option defines the number of sentences from the previous chunk to include in the current chunk which is either `0` or `1`.
115113

116-
`recursive`{applies_to}`stack: ga 9.1`
117-
: The `recursive` strategy splits the input text based on a configurable list of separator patterns (for example, newlines or Markdown headers). The chunker applies these separators in order, recursively splitting any chunk that exceeds the `max_chunk_size` word limit. If no separator produces a small enough chunk, the strategy falls back to sentence-level splitting.
114+
The following example creates an {{infer}} endpoint with the `elasticsearch` service that deploys the ELSER model and configures the chunking behavior with the `sentence` strategy.
118115

119-
`none` {applies_to}`stack: ga 9.1`
120-
121-
: The `none` strategy disables chunking and processes the entire input text as a single block, without any splitting or overlap. When using this strategy, you can instead [pre-chunk](https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/semantic-text#auto-text-chunking) the input by providing an array of strings, where each element acts as a separate chunk to be sent directly to the inference service without further chunking.
116+
```console
117+
PUT _inference/sparse_embedding/sentence_chunks
118+
{
119+
"service": "elasticsearch",
120+
"service_settings": {
121+
"model_id": ".elser_model_2",
122+
"num_allocations": 1,
123+
"num_threads": 1
124+
},
125+
"chunking_settings": {
126+
"strategy": "sentence",
127+
"max_chunk_size": 100,
128+
"sentence_overlap": 0
129+
}
130+
}
131+
```
122132

123133
The default chunking strategy is `sentence`.
124134

125-
#### Example of configuring the chunking behavior
135+
#### `word`
136+
137+
The `word` strategy splits the input text on individual words up to the `max_chunk_size` limit. The `overlap` option is the number of words from the previous chunk to include in the current chunk.
126138

127-
The following example creates an {{infer}} endpoint with the `elasticsearch` service that deploys the ELSER model by default and configures the chunking behavior.
139+
The following example creates an {{infer}} endpoint with the `elasticsearch` service that deploys the ELSER model and configures the chunking behavior with the `word` strategy, setting a maximum of 120 words per chunk and an overlap of 40 words between chunks.
128140

129141
```console
130-
PUT _inference/sparse_embedding/small_chunk_size
142+
PUT _inference/sparse_embedding/word_chunks
131143
{
132144
"service": "elasticsearch",
133145
"service_settings": {
146+
"model_id": ".elser_model_2",
134147
"num_allocations": 1,
135148
"num_threads": 1
136149
},
137150
"chunking_settings": {
138-
"strategy": "sentence",
139-
"max_chunk_size": 100,
140-
"sentence_overlap": 0
151+
"strategy": "word",
152+
"max_chunk_size": 120,
153+
"overlap": 40
154+
}
155+
}
156+
```
157+
158+
#### `recursive`
159+
160+
```{applies_to}
161+
stack: ga 9.1`
162+
```
163+
164+
The `recursive` strategy splits the input text based on a configurable list of separator patterns (for example, newlines or Markdown headers). The chunker applies these separators in order, recursively splitting any chunk that exceeds the `max_chunk_size` word limit. If no separator produces a small enough chunk, the strategy falls back to sentence-level splitting.
165+
166+
##### Markdown separator group
167+
168+
The following example creates an {{infer}} endpoint with the `elasticsearch` service that deploys the ELSER model and configures chunking with the `recursive` strategy using the markdown separator group and a maximum of 200 words per chunk.
169+
170+
```console
171+
PUT _inference/sparse_embedding/recursive_markdown_chunks
172+
{
173+
"service": "elasticsearch",
174+
"service_settings": {
175+
"model_id": ".elser_model_2",
176+
"num_allocations": 1,
177+
"num_threads": 1
178+
},
179+
"chunking_settings": {
180+
"strategy": "recursive",
181+
"max_chunk_size": 200,
182+
"separator_group": "markdown"
183+
}
184+
}
185+
```
186+
187+
##### Custom separator group
188+
189+
The following example creates an {{infer}} endpoint with the `elasticsearch` service that deploys the ELSER model and configures chunking with the `recursive` strategy. It uses a custom list of separators to split plaintext into chunks of up to 180 words.
190+
191+
192+
```console
193+
PUT _inference/sparse_embedding/recursive_custom_chunks
194+
{
195+
"service": "elasticsearch",
196+
"service_settings": {
197+
"model_id": ".elser_model_2",
198+
"num_allocations": 1,
199+
"num_threads": 1
200+
},
201+
"chunking_settings": {
202+
"strategy": "recursive",
203+
"max_chunk_size": 180,
204+
"separators": [
205+
"^(#{1,6})\\s",
206+
"\\n\\n",
207+
"\\n[-*]\\s",
208+
"\\n\\d+\\.\\s",
209+
"\\n"
210+
]
211+
}
212+
}
213+
```
214+
215+
#### `none`
216+
217+
```{applies_to}
218+
stack: ga 9.1`
219+
```
220+
221+
The `none` strategy disables chunking and processes the entire input text as a single block, without any splitting or overlap. When using this strategy, you can instead [pre-chunk](https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/semantic-text#auto-text-chunking) the input by providing an array of strings, where each element acts as a separate chunk to be sent directly to the inference service without further chunking.
222+
223+
The following example creates an {{infer}} endpoint with the `elasticsearch` service that deploys the ELSER model and disables chunking by setting the strategy to `none`.
224+
225+
```console
226+
PUT _inference/sparse_embedding/none_chunking
227+
{
228+
"service": "elasticsearch",
229+
"service_settings": {
230+
"model_id": ".elser_model_2",
231+
"num_allocations": 1,
232+
"num_threads": 1
233+
},
234+
"chunking_settings": {
235+
"strategy": "none"
141236
}
142237
}
143238
```

manage-data/data-store/data-streams/quickstart-tsds.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,8 +18,8 @@ A _time series_ is a sequence of data points collected at regular time intervals
1818
* Access to [{{dev-tools-app}} Console](/explore-analyze/query-filter/tools/console.md) in {{kib}}, or another way to make {{es}} API requests
1919

2020
* Cluster and index permissions:
21-
* [Cluster privilege](/deploy-manage/users-roles/cluster-or-deployment-auth/elasticsearch-privileges.md#privileges-list-cluster): `manage_index_templates`
22-
* [Index privileges](/deploy-manage/users-roles/cluster-or-deployment-auth/elasticsearch-privileges.md#privileges-list-indices): `create_doc` and `create_index`
21+
* [Cluster privilege](elasticsearch://reference/elasticsearch/security-privileges.md#privileges-list-cluster): `manage_index_templates`
22+
* [Index privileges](elasticsearch://reference/elasticsearch/security-privileges.md#privileges-list-indices): `create_doc` and `create_index`
2323

2424
* Familiarity with [time series data stream concepts](time-series-data-stream-tsds.md) and [{{es}} index and search basics](/solutions/search/get-started.md)
2525

release-notes/intro/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ navigation_title: Release notes
1212

1313
Stay up to date with the latest changes, fixes, known issues, and deprecations in Elastic products.
1414

15-
Release notes cover the all the latest Elastic product changes, including the following:
15+
Release notes cover all the latest Elastic product changes, including the following:
1616
* {{stack}} {{version.stack.base}} and later, including the most recent {{version.stack}} release
1717
* {{serverless-full}}, including updates to {{es}}, and {{observability}} and {{elastic-sec}} solutions
1818

solutions/observability/connect-to-own-local-llm.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,10 @@ If your Elastic deployment is not on the same network, you must configure an Ngi
1919
You do not have to set up a proxy if LM Studio is running locally, or on the same network as your Elastic deployment.
2020
::::
2121

22+
::::{note}
23+
For information about the performance of open-source models on {{obs-ai-assistant}} tasks, refer to the [LLM performance matrix](/solutions/observability/llm-performance-matrix.md).
24+
::::
25+
2226
This example uses a server hosted in GCP to configure LM Studio with the [Llama-3.3-70B-Instruct](https://huggingface.co/lmstudio-community/Llama-3.3-70B-Instruct-GGUF) model.
2327

2428
### Already running LM Studio? [skip-if-already-running]
Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
---
2+
mapped_pages:
3+
- https://www.elastic.co/guide/en/observability/current/observability-llm-performance-matrix.html
4+
applies_to:
5+
stack: ga 9.2
6+
serverless: ga
7+
products:
8+
- id: observability
9+
---
10+
11+
# Large language model performance matrix
12+
13+
This page summarizes internal test results comparing large language models (LLMs) across {{obs-ai-assistant}} use cases. To learn more about these use cases, refer to [AI Assistant](/solutions/observability/observability-ai-assistant.md).
14+
15+
::::{important}
16+
Rating legend:
17+
18+
**Excellent:** Highly accurate and reliable for the use case.<br>
19+
**Great:** Strong performance with minor limitations.<br>
20+
**Good:** Possibly adequate for many use cases but with noticeable tradeoffs.<br>
21+
**Poor:** Significant issues; not recommended for production for the use case.
22+
23+
Recommended models are those rated **Excellent** or **Great** for the particular use case.
24+
::::
25+
26+
## Proprietary models [_proprietary_models]
27+
28+
Models from third-party LLM providers.
29+
30+
| Provider | Model | **Alert questions** | **APM questions** | **Contextual insights** | **Documentation retrieval** | **Elasticsearch operations** | **{{esql}} generation** | **Execute connector** | **Knowledge retrieval** |
31+
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
32+
| Amazon Bedrock | **Claude Sonnet 3.5** | Excellent | Excellent | Excellent | Excellent | Excellent | Great | Good | Excellent |
33+
| Amazon Bedrock | **Claude Sonnet 3.7** | Excellent | Excellent | Excellent | Excellent | Excellent | Great | Great | Excellent |
34+
| Amazon Bedrock | **Claude Sonnet 4** | Excellent | Excellent | Excellent | Excellent | Excellent | Excellent | Great | Excellent |
35+
| OpenAI | **GPT-4.1** | Excellent | Excellent | Excellent | Excellent | Excellent | Great | Good | Excellent |
36+
| Google Gemini | **Gemini 2.0 Flash** | Excellent | Good | Excellent | Excellent | Excellent | Good | Good | Excellent |
37+
| Google Gemini | **Gemini 2.5 Flash** | Excellent | Good | Excellent | Excellent | Excellent | Good | Good | Excellent |
38+
| Google Gemini | **Gemini 2.5 Pro** | Excellent | Great | Excellent | Excellent | Excellent | Good | Good | Excellent |
39+
40+
41+
## Open-source models [_open_source_models]
42+
43+
```{applies_to}
44+
stack: preview 9.2
45+
serverless: preview
46+
```
47+
48+
Models you can [deploy and manage yourself](/solutions/observability/connect-to-own-local-llm.md).
49+
50+
| Provider | Model | **Alert questions** | **APM questions** | **Contextual insights** | **Documentation retrieval** | **Elasticsearch operations** | **{{esql}} generation** | **Execute connector** | **Knowledge retrieval** |
51+
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
52+
| Meta | **Llama-3.3-70B-Instruct** | Excellent | Good | Great | Excellent | Excellent | Good | Good | Excellent |
53+
| Mistral | **Mistral-Small-3.2-24B-Instruct-2506** | Excellent | Poor | Great | Great | Excellent | Poor | Good | Excellent |
54+
55+
::::{note}
56+
`Llama-3.3-70B-Instruct` is supported with simulated function calling.
57+
::::
58+
59+
## Evaluate your own model
60+
61+
You can run the {{obs-ai-assistant}} evaluation framework against any model, and use it to benchmark a custom or self-hosted model against the use cases in the matrix. Refer to the [evaluation framework README](https://github.com/elastic/kibana/blob/main/x-pack/solutions/observability/plugins/observability_ai_assistant_app/scripts/evaluation/README.md) for setup and usage details.
62+
63+
For consistency, all ratings in this matrix were generated using `Gemini 2.5 Pro` as the judge model (specified via the `--evaluateWith` flag). Use the same judge when evaluating your own model to ensure comparable results.

solutions/observability/observability-ai-assistant.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,11 @@ The AI Assistant connects to one of these supported LLM providers:
9191
- The provider's API endpoint URL
9292
- Your authentication key or secret
9393

94+
::::{admonition} Recommended models
95+
While the {{obs-ai-assistant}} is compatible with many different models, refer to the [Large language model performance matrix](/solutions/observability/llm-performance-matrix.md) to select models that perform well with your desired use cases.
96+
97+
::::
98+
9499
### Elastic Managed LLM [elastic-managed-llm-obs-ai-assistant]
95100

96101
:::{include} ../_snippets/elastic-managed-llm.md

0 commit comments

Comments
 (0)