elastic
diff --git a/‎.gitignore‎
Lines changed: 4 additions & 0 deletions b/‎.gitignore‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎deploy-manage/toc.yml‎
Lines changed: 2 additions & 1 deletion b/‎deploy-manage/toc.yml‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎deploy-manage/users-roles/cluster-or-deployment-auth/elasticsearch-privileges.md‎
Lines changed: 0 additions & 416 deletions b/‎deploy-manage/users-roles/cluster-or-deployment-auth/elasticsearch-privileges.md‎
Lines changed: 0 additions & 416 deletions
diff --git a/‎docset.yml‎
Lines changed: 3 additions & 1 deletion b/‎docset.yml‎
Lines changed: 3 additions & 1 deletion
diff --git a/‎explore-analyze/elastic-inference/inference-api.md‎
Lines changed: 110 additions & 15 deletions b/‎explore-analyze/elastic-inference/inference-api.md‎
Lines changed: 110 additions & 15 deletions
diff --git a/‎manage-data/data-store/data-streams/quickstart-tsds.md‎
Lines changed: 2 additions & 2 deletions b/‎manage-data/data-store/data-streams/quickstart-tsds.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎release-notes/intro/index.md‎
Lines changed: 1 addition & 1 deletion b/‎release-notes/intro/index.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎solutions/observability/connect-to-own-local-llm.md‎
Lines changed: 4 additions & 0 deletions b/‎solutions/observability/connect-to-own-local-llm.md‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎solutions/observability/llm-performance-matrix.md‎
Lines changed: 63 additions & 0 deletions b/‎solutions/observability/llm-performance-matrix.md‎
Lines changed: 63 additions & 0 deletions
diff --git a/‎solutions/observability/observability-ai-assistant.md‎
Lines changed: 5 additions & 0 deletions b/‎solutions/observability/observability-ai-assistant.md‎
Lines changed: 5 additions & 0 deletions
@@ -2,6 +2,10 @@
 .artifacts
 .DS_store
 
+# Jetbrains files
+.idea
+*.iml
+
 # Add LLM/AI related files
 AGENTS.md
 .github/copilot-instructions.md
 
@@ -609,7 +609,8 @@ toc:
                   - file: users-roles/cluster-or-deployment-auth/granting-privileges-for-data-streams-aliases.md
                   - file: users-roles/cluster-or-deployment-auth/kibana-role-management.md
                   - file: users-roles/cluster-or-deployment-auth/role-restriction.md
-              - file: users-roles/cluster-or-deployment-auth/elasticsearch-privileges.md                  
+              - title: "Elasticsearch privileges"
+                crosslink: elasticsearch://reference/elasticsearch/security-privileges.md              
               - file: users-roles/cluster-or-deployment-auth/kibana-privileges.md
               - file: users-roles/cluster-or-deployment-auth/mapping-users-groups-to-roles.md
                 children:
 
@@ -294,4 +294,6 @@ subs:
   ece-apis: https://www.elastic.co/docs/api/doc/cloud-enterprise/
   intake-apis: https://www.elastic.co/docs/api/doc/observability-serverless/
   models-app: "Trained Models"
-  kube-stack-version: 0.6.3
+  agent-builder: "Elastic Agent Builder"
+  kube-stack-version: 0.6.3
+
@@ -107,37 +107,132 @@ By default, documents are split into sentences and grouped in sections up to 250
 
 Several strategies are available for chunking: 
 
-`sentence`
-:   The `sentence` strategy splits the input text at sentence boundaries. Each chunk contains one or more complete sentences ensuring that the integrity of sentence-level context is preserved, except when a sentence causes a chunk to exceed a word count of `max_chunk_size`, in which case it will be split across chunks. The `sentence_overlap` option defines the number of sentences from the previous chunk to include in the current chunk which is either `0` or `1`.
+#### `sentence` 
 
-`word`
-:   The `word` strategy splits the input text on individual words up to the `max_chunk_size` limit. The `overlap` option is the number of words from the previous chunk to include in the current chunk.
+The `sentence` strategy splits the input text at sentence boundaries. Each chunk contains one or more complete sentences ensuring that the integrity of sentence-level context is preserved, except when a sentence causes a chunk to exceed a word count of `max_chunk_size`, in which case it will be split across chunks. The `sentence_overlap` option defines the number of sentences from the previous chunk to include in the current chunk which is either `0` or `1`.
 
-`recursive`{applies_to}`stack: ga 9.1`
-:   The `recursive` strategy splits the input text based on a configurable list of separator patterns (for example, newlines or Markdown headers). The chunker applies these separators in order, recursively splitting any chunk that exceeds the `max_chunk_size` word limit. If no separator produces a small enough chunk, the strategy falls back to sentence-level splitting.
+The following example creates an {{infer}} endpoint with the `elasticsearch` service that deploys the ELSER model and configures the chunking behavior with the `sentence` strategy.
 
-`none` {applies_to}`stack: ga 9.1`
-
-:   The `none` strategy disables chunking and processes the entire input text as a single block, without any splitting or overlap. When using this strategy, you can instead [pre-chunk](https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/semantic-text#auto-text-chunking) the input by providing an array of strings, where each element acts as a separate chunk to be sent directly to the inference service without further chunking.
+```console
+PUT _inference/sparse_embedding/sentence_chunks
+{
+  "service": "elasticsearch",
+  "service_settings": {
+    "model_id": ".elser_model_2",
+    "num_allocations": 1,
+    "num_threads": 1
+  },
+  "chunking_settings": {
+    "strategy": "sentence",
+    "max_chunk_size": 100,
+    "sentence_overlap": 0
+  }
+}
+```
 
 The default chunking strategy is `sentence`.
 
-#### Example of configuring the chunking behavior
+#### `word`
+
+The `word` strategy splits the input text on individual words up to the `max_chunk_size` limit. The `overlap` option is the number of words from the previous chunk to include in the current chunk. 
 
-The following example creates an {{infer}} endpoint with the `elasticsearch` service that deploys the ELSER model by default and configures the chunking behavior.
+The following example creates an {{infer}} endpoint with the `elasticsearch` service that deploys the ELSER model and configures the chunking behavior with the `word` strategy, setting a maximum of 120 words per chunk and an overlap of 40 words between chunks.
 
 ```console
-PUT _inference/sparse_embedding/small_chunk_size
+PUT _inference/sparse_embedding/word_chunks
 {
   "service": "elasticsearch",
   "service_settings": {
+    "model_id": ".elser_model_2",
     "num_allocations": 1,
     "num_threads": 1
   },
   "chunking_settings": {
-    "strategy": "sentence",
-    "max_chunk_size": 100,
-    "sentence_overlap": 0
+    "strategy": "word",
+    "max_chunk_size": 120,
+    "overlap": 40
+  }
+}
+```
+
+#### `recursive`
+
+```{applies_to}
+stack: ga 9.1`
+```
+
+The `recursive` strategy splits the input text based on a configurable list of separator patterns (for example, newlines or Markdown headers). The chunker applies these separators in order, recursively splitting any chunk that exceeds the `max_chunk_size` word limit. If no separator produces a small enough chunk, the strategy falls back to sentence-level splitting.
+
+##### Markdown separator group
+
+The following example creates an {{infer}} endpoint with the `elasticsearch` service that deploys the ELSER model and configures chunking with the `recursive` strategy using the markdown separator group and a maximum of 200 words per chunk.
+
+```console
+PUT _inference/sparse_embedding/recursive_markdown_chunks
+{
+  "service": "elasticsearch",
+  "service_settings": {
+    "model_id": ".elser_model_2",
+    "num_allocations": 1,
+    "num_threads": 1
+  },
+  "chunking_settings": {
+    "strategy": "recursive",
+    "max_chunk_size": 200,
+    "separator_group": "markdown"
+  }
+}
+```
+
+##### Custom separator group
+
+The following example creates an {{infer}} endpoint with the `elasticsearch` service that deploys the ELSER model and configures chunking with the `recursive` strategy. It uses a custom list of separators to split plaintext into chunks of up to 180 words.
+
+
+```console
+PUT _inference/sparse_embedding/recursive_custom_chunks
+{
+  "service": "elasticsearch",
+  "service_settings": {
+    "model_id": ".elser_model_2",
+    "num_allocations": 1,
+    "num_threads": 1
+  },
+  "chunking_settings": {
+    "strategy": "recursive",
+    "max_chunk_size": 180,
+    "separators": [
+      "^(#{1,6})\\s",
+      "\\n\\n",
+      "\\n[-*]\\s",
+      "\\n\\d+\\.\\s",
+      "\\n"
+    ]
+  }
+}
+```
+
+#### `none`
+
+```{applies_to}
+stack: ga 9.1`
+```
+
+The `none` strategy disables chunking and processes the entire input text as a single block, without any splitting or overlap. When using this strategy, you can instead [pre-chunk](https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/semantic-text#auto-text-chunking) the input by providing an array of strings, where each element acts as a separate chunk to be sent directly to the inference service without further chunking.
+
+The following example creates an {{infer}} endpoint with the `elasticsearch` service that deploys the ELSER model and disables chunking by setting the strategy to `none`.
+
+```console
+PUT _inference/sparse_embedding/none_chunking
+{
+  "service": "elasticsearch",
+  "service_settings": {
+    "model_id": ".elser_model_2",
+    "num_allocations": 1,
+    "num_threads": 1
+  },
+  "chunking_settings": {
+    "strategy": "none"
   }
 }
 ```
@@ -18,8 +18,8 @@ A _time series_ is a sequence of data points collected at regular time intervals
 * Access to [{{dev-tools-app}} Console](/explore-analyze/query-filter/tools/console.md) in {{kib}}, or another way to make {{es}} API requests
 
 * Cluster and index permissions: 
-    * [Cluster privilege](/deploy-manage/users-roles/cluster-or-deployment-auth/elasticsearch-privileges.md#privileges-list-cluster):  `manage_index_templates`
-    * [Index privileges](/deploy-manage/users-roles/cluster-or-deployment-auth/elasticsearch-privileges.md#privileges-list-indices): `create_doc` and `create_index`
+    * [Cluster privilege](elasticsearch://reference/elasticsearch/security-privileges.md#privileges-list-cluster):  `manage_index_templates`
+    * [Index privileges](elasticsearch://reference/elasticsearch/security-privileges.md#privileges-list-indices): `create_doc` and `create_index`
 
 * Familiarity with [time series data stream concepts](time-series-data-stream-tsds.md) and [{{es}} index and search basics](/solutions/search/get-started.md) 
 
 
@@ -12,7 +12,7 @@ navigation_title: Release notes
 
 Stay up to date with the latest changes, fixes, known issues, and deprecations in Elastic products. 
 
-Release notes cover the all the latest Elastic product changes, including the following:
+Release notes cover all the latest Elastic product changes, including the following:
 * {{stack}} {{version.stack.base}} and later, including the most recent {{version.stack}} release
 * {{serverless-full}}, including updates to {{es}}, and {{observability}} and {{elastic-sec}} solutions
 
 
@@ -19,6 +19,10 @@ If your Elastic deployment is not on the same network, you must configure an Ngi
 You do not have to set up a proxy if LM Studio is running locally, or on the same network as your Elastic deployment. 
 ::::
 
+::::{note}
+For information about the performance of open-source models on {{obs-ai-assistant}} tasks, refer to the [LLM performance matrix](/solutions/observability/llm-performance-matrix.md).
+::::
+
 This example uses a server hosted in GCP to configure LM Studio with the [Llama-3.3-70B-Instruct](https://huggingface.co/lmstudio-community/Llama-3.3-70B-Instruct-GGUF) model.
 
 ### Already running LM Studio? [skip-if-already-running]
 
@@ -0,0 +1,63 @@
+---
+mapped_pages:
+  - https://www.elastic.co/guide/en/observability/current/observability-llm-performance-matrix.html
+applies_to:
+  stack: ga 9.2
+  serverless: ga
+products:
+  - id: observability
+---
+
+# Large language model performance matrix
+
+This page summarizes internal test results comparing large language models (LLMs) across {{obs-ai-assistant}} use cases. To learn more about these use cases, refer to [AI Assistant](/solutions/observability/observability-ai-assistant.md).
+
+::::{important}
+Rating legend:
+
+**Excellent:** Highly accurate and reliable for the use case.<br>
+**Great:** Strong performance with minor limitations.<br>
+**Good:** Possibly adequate for many use cases but with noticeable tradeoffs.<br>
+**Poor:** Significant issues; not recommended for production for the use case.
+
+Recommended models are those rated **Excellent** or **Great** for the particular use case.
+::::
+
+## Proprietary models [_proprietary_models]
+
+Models from third-party LLM providers.
+
+| Provider | Model | **Alert questions** | **APM questions** | **Contextual insights** | **Documentation retrieval** | **Elasticsearch operations** | **{{esql}} generation** | **Execute connector** | **Knowledge retrieval** |
+| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
+| Amazon Bedrock | **Claude Sonnet 3.5** | Excellent | Excellent | Excellent | Excellent | Excellent | Great | Good | Excellent |
+| Amazon Bedrock | **Claude Sonnet 3.7** | Excellent | Excellent | Excellent | Excellent | Excellent | Great | Great | Excellent |
+| Amazon Bedrock | **Claude Sonnet 4**   | Excellent | Excellent | Excellent | Excellent | Excellent | Excellent | Great | Excellent |
+| OpenAI    | **GPT-4.1**           | Excellent | Excellent | Excellent | Excellent | Excellent | Great | Good | Excellent |
+| Google Gemini    | **Gemini 2.0 Flash**    | Excellent | Good | Excellent | Excellent | Excellent | Good | Good | Excellent |
+| Google Gemini    | **Gemini 2.5 Flash**    | Excellent | Good | Excellent | Excellent | Excellent | Good | Good | Excellent |
+| Google Gemini    | **Gemini 2.5 Pro**    | Excellent | Great | Excellent | Excellent | Excellent | Good | Good | Excellent |
+
+
+## Open-source models [_open_source_models]
+
+```{applies_to}
+stack: preview 9.2
+serverless: preview
+```
+
+Models you can [deploy and manage yourself](/solutions/observability/connect-to-own-local-llm.md).
+
+| Provider | Model | **Alert questions** | **APM questions** | **Contextual insights** | **Documentation retrieval** | **Elasticsearch operations** | **{{esql}} generation** | **Execute connector** | **Knowledge retrieval** |
+| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
+| Meta | **Llama-3.3-70B-Instruct** | Excellent | Good | Great | Excellent | Excellent | Good | Good | Excellent |
+| Mistral | **Mistral-Small-3.2-24B-Instruct-2506** | Excellent | Poor | Great | Great | Excellent | Poor | Good | Excellent |
+
+::::{note}
+`Llama-3.3-70B-Instruct` is supported with simulated function calling.
+::::
+
+## Evaluate your own model
+
+You can run the {{obs-ai-assistant}} evaluation framework against any model, and use it to benchmark a custom or self-hosted model against the use cases in the matrix. Refer to the [evaluation framework README](https://github.com/elastic/kibana/blob/main/x-pack/solutions/observability/plugins/observability_ai_assistant_app/scripts/evaluation/README.md) for setup and usage details.
+
+For consistency, all ratings in this matrix were generated using `Gemini 2.5 Pro` as the judge model (specified via the `--evaluateWith` flag). Use the same judge when evaluating your own model to ensure comparable results.
@@ -91,6 +91,11 @@ The AI Assistant connects to one of these supported LLM providers:
    - The provider's API endpoint URL
    - Your authentication key or secret
 
+::::{admonition} Recommended models
+While the {{obs-ai-assistant}} is compatible with many different models, refer to the [Large language model performance matrix](/solutions/observability/llm-performance-matrix.md) to select models that perform well with your desired use cases.
+
+::::
+
 ### Elastic Managed LLM [elastic-managed-llm-obs-ai-assistant]
 
 :::{include} ../_snippets/elastic-managed-llm.md