Skip to content
Merged
Show file tree
Hide file tree
Changes from 18 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
135f57c
Add initial docs for performance matrix
viduni94 Sep 4, 2025
ce47b33
Add link to the evaluation framework
viduni94 Sep 4, 2025
d462cde
Add new docs to toc
viduni94 Sep 4, 2025
2f75666
Update formatting
viduni94 Sep 4, 2025
fb00b1d
Add Mistral
viduni94 Sep 4, 2025
81f1d2a
Update wording
viduni94 Sep 4, 2025
8de6028
Update legend
viduni94 Sep 4, 2025
3f45db8
Update legend format
viduni94 Sep 4, 2025
fc8d256
Update columns
viduni94 Sep 4, 2025
2020139
Update rating for proprietary models
viduni94 Sep 4, 2025
ea837bb
Remove rating for local models
viduni94 Sep 4, 2025
0f180bf
Update llama and mistral small scores
viduni94 Sep 5, 2025
a4d1397
Update mistral small es|ql rating
viduni94 Sep 5, 2025
c641bee
Update llama model name in note
viduni94 Sep 5, 2025
b5bfd40
Fix typo
viduni94 Sep 5, 2025
2ed08b5
Update llm-performance-matrix.md
viduni94 Sep 5, 2025
b4f3816
Update connect-to-own-local-llm.md
viduni94 Sep 5, 2025
0ebcaf2
Update llm-performance-matrix.md
viduni94 Sep 5, 2025
95dad45
Add judge model
viduni94 Sep 10, 2025
ebb7d47
Merge branch 'main' into obs-ai-assistant-llm-performance-matrix
viduni94 Sep 10, 2025
86da386
Update ratings
viduni94 Sep 12, 2025
efba9b5
Update ratings to the new scale
viduni94 Sep 15, 2025
9c991fc
Update date
viduni94 Sep 15, 2025
cb7e64e
Merge branch 'main' into obs-ai-assistant-llm-performance-matrix
viduni94 Sep 15, 2025
2e737ae
Update solutions/observability/llm-performance-matrix.md
viduni94 Sep 16, 2025
4c2579f
Merge branch 'main' into obs-ai-assistant-llm-performance-matrix
florent-leborgne Sep 16, 2025
9ecce3d
Address review comments
viduni94 Sep 16, 2025
8e093b9
Remove date
viduni94 Sep 17, 2025
39d5c1b
Merge branch 'main' into obs-ai-assistant-llm-performance-matrix
florent-leborgne Sep 17, 2025
4ee68f0
Merge branch 'main' into obs-ai-assistant-llm-performance-matrix
viduni94 Sep 17, 2025
24ab5c0
Merge branch 'main' into obs-ai-assistant-llm-performance-matrix
viduni94 Sep 17, 2025
50ef7b2
Merge branch 'main' into obs-ai-assistant-llm-performance-matrix
viduni94 Sep 18, 2025
4a47c0c
Merge branch 'main' into obs-ai-assistant-llm-performance-matrix
viduni94 Sep 18, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions solutions/observability/connect-to-own-local-llm.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,10 @@ If your Elastic deployment is not on the same network, you must configure an Ngi
You do not have to set up a proxy if LM Studio is running locally, or on the same network as your Elastic deployment.
::::

::::{note}
For information about the performance of open-source models on {{obs-ai-assistant}} tasks, refer to the [LLM performance matrix](/solutions/observability/llm-performance-matrix.md).
::::

This example uses a server hosted in GCP to configure LM Studio with the [Llama-3.3-70B-Instruct](https://huggingface.co/lmstudio-community/Llama-3.3-70B-Instruct-GGUF) model.

### Already running LM Studio? [skip-if-already-running]
Expand Down
58 changes: 58 additions & 0 deletions solutions/observability/llm-performance-matrix.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
---
mapped_pages:
- https://www.elastic.co/guide/en/observability/current/observability-llm-performance-matrix.html
applies_to:
stack: ga 9.2
serverless: ga
products:
- id: observability
---

# Large language model performance matrix

_Last updated: 4 September 2025_
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably avoid putting dates in the docs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In scope and requirements of issue, there is a mention of

include “last updated” note.

@pmoust To clarify, should we include the last updated date of the performance matrix in the docs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mdbirnstiehl we'd like to have some way to indicate that that's a "version" of our current supportability levels. If not a date, and given that we're not linking it to a stack release per se, what would you recommend @mdbirnstiehl ?

If no strong opinions, I'd just keep the date.


This page summarizes internal test results comparing large language models (LLMs) across {{obs-ai-assistant}} use cases. To learn more about these use cases, refer to [AI Assistant](/solutions/observability/observability-ai-assistant.md).

::::{important}
Rating legend:

**Excellent:** Highly accurate and reliable for the use case.<br>
**Great:** Strong performance with minor limitations.<br>
**Good:** Possibly adequate for many use cases but with noticeable tradeoffs.<br>
**Poor:** Significant issues; not recommended for production for the use case.

Recommended models are those rated **Excellent** or **Great** for the particular use case.
::::

## Proprietary models [_proprietary_models]

Models from third-party LLM providers.

| Provider | Model | **Alert questions** | **APM questions** | **Contextual insights** | **Documentation retrieval** | **Elasticsearch operations** | **{{esql}} generation** | **Execute connector** | **Knowledge retrieval** |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| Amazon Bedrock | **Claude Sonnet 3.5** | Excellent | Excellent | Excellent | Excellent | Excellent | Excellent | Great | Excellent |
| Amazon Bedrock | **Claude Sonnet 3.7** | Excellent | Excellent | Excellent | Excellent | Excellent | Excellent | Excellent | Excellent |
| Amazon Bedrock | **Claude Sonnet 4** | Excellent | Excellent | Excellent | Excellent | Excellent | Excellent | Excellent | Excellent |
| OpenAI | **GPT-4.1** | Excellent | Excellent | Excellent | Excellent | Excellent | Excellent | Great | Excellent |
| Google Gemini | **Gemini 2.0 Flash** | Excellent | Good | Great | Excellent | Excellent | Great | Great | Excellent |
| Google Gemini | **Gemini 2.5 Flash** | Excellent | Great | Excellent | Excellent | Excellent | Great | Great | Excellent |
| Google Gemini | **Gemini 2.5 Pro** | Excellent | Excellent | Excellent | Excellent | Great | Great | Excellent | Excellent |


## Open-source models [_open_source_models]

Models you can [deploy and manage yourself](/solutions/observability/connect-to-own-local-llm.md).

| Provider | Model | **Alert questions** | **APM questions** | **Contextual insights** | **Documentation retrieval** | **Elasticsearch operations** | **{{esql}} generation** | **Execute connector** | **Knowledge retrieval** |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| Meta | **Llama-3.3-70B-Instruct** | Excellent | Good | Great | Excellent | Excellent | Great | Great | Excellent |
| Mistral | **Mistral-Small-3.2-24B-Instruct-2506** | Excellent | Good | Great | Excellent | Excellent | Poor | Great | Excellent |

::::{note}
`Llama-3.3-70B-Instruct` is currently supported with simulated function calling.
::::

## Evaluate your own model

You can run the {{obs-ai-assistant}} evaluation framework against any model, and use it to benchmark a custom or self-hosted model against the use cases in the matrix. Refer to the [evaluation framework README](https://github.com/elastic/kibana/blob/main/x-pack/solutions/observability/plugins/observability_ai_assistant_app/scripts/evaluation/README.md) for setup and usage details.
5 changes: 5 additions & 0 deletions solutions/observability/observability-ai-assistant.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,11 @@ The AI Assistant connects to one of these supported LLM providers:
- The provider's API endpoint URL
- Your authentication key or secret

::::{admonition} Recommended models
While the AI Assistant is compatible with many different models, refer to the [Large language model performance matrix](/solutions/observability/llm-performance-matrix.md) to select models that perform well with your desired use cases.

::::

### Elastic Managed LLM [elastic-managed-llm-obs-ai-assistant]

:::{include} ../_snippets/elastic-managed-llm.md
Expand Down
1 change: 1 addition & 0 deletions solutions/toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -466,6 +466,7 @@ toc:
- file: observability/observability-ai-assistant.md
children:
- file: observability/connect-to-own-local-llm.md
- file: observability/llm-performance-matrix.md
- file: observability/observability-serverless-feature-tiers.md
- file: security.md
children:
Expand Down
Loading