Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
135f57c
Add initial docs for performance matrix
viduni94 Sep 4, 2025
ce47b33
Add link to the evaluation framework
viduni94 Sep 4, 2025
d462cde
Add new docs to toc
viduni94 Sep 4, 2025
2f75666
Update formatting
viduni94 Sep 4, 2025
fb00b1d
Add Mistral
viduni94 Sep 4, 2025
81f1d2a
Update wording
viduni94 Sep 4, 2025
8de6028
Update legend
viduni94 Sep 4, 2025
3f45db8
Update legend format
viduni94 Sep 4, 2025
fc8d256
Update columns
viduni94 Sep 4, 2025
2020139
Update rating for proprietary models
viduni94 Sep 4, 2025
ea837bb
Remove rating for local models
viduni94 Sep 4, 2025
0f180bf
Update llama and mistral small scores
viduni94 Sep 5, 2025
a4d1397
Update mistral small es|ql rating
viduni94 Sep 5, 2025
c641bee
Update llama model name in note
viduni94 Sep 5, 2025
b5bfd40
Fix typo
viduni94 Sep 5, 2025
2ed08b5
Update llm-performance-matrix.md
viduni94 Sep 5, 2025
b4f3816
Update connect-to-own-local-llm.md
viduni94 Sep 5, 2025
0ebcaf2
Update llm-performance-matrix.md
viduni94 Sep 5, 2025
95dad45
Add judge model
viduni94 Sep 10, 2025
ebb7d47
Merge branch 'main' into obs-ai-assistant-llm-performance-matrix
viduni94 Sep 10, 2025
86da386
Update ratings
viduni94 Sep 12, 2025
efba9b5
Update ratings to the new scale
viduni94 Sep 15, 2025
9c991fc
Update date
viduni94 Sep 15, 2025
cb7e64e
Merge branch 'main' into obs-ai-assistant-llm-performance-matrix
viduni94 Sep 15, 2025
2e737ae
Update solutions/observability/llm-performance-matrix.md
viduni94 Sep 16, 2025
4c2579f
Merge branch 'main' into obs-ai-assistant-llm-performance-matrix
florent-leborgne Sep 16, 2025
9ecce3d
Address review comments
viduni94 Sep 16, 2025
8e093b9
Remove date
viduni94 Sep 17, 2025
39d5c1b
Merge branch 'main' into obs-ai-assistant-llm-performance-matrix
florent-leborgne Sep 17, 2025
4ee68f0
Merge branch 'main' into obs-ai-assistant-llm-performance-matrix
viduni94 Sep 17, 2025
24ab5c0
Merge branch 'main' into obs-ai-assistant-llm-performance-matrix
viduni94 Sep 17, 2025
50ef7b2
Merge branch 'main' into obs-ai-assistant-llm-performance-matrix
viduni94 Sep 18, 2025
4a47c0c
Merge branch 'main' into obs-ai-assistant-llm-performance-matrix
viduni94 Sep 18, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions solutions/observability/connect-to-own-local-llm.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,10 @@ If your Elastic deployment is not on the same network, you must configure an Ngi
You do not have to set up a proxy if LM Studio is running locally, or on the same network as your Elastic deployment.
::::

::::{note}
For information about the performance of open-source models on {{obs-ai-assistant}} tasks, refer to the [LLM performance matrix](/solutions/observability/llm-performance-matrix.md).
::::

This example uses a server hosted in GCP to configure LM Studio with the [Llama-3.3-70B-Instruct](https://huggingface.co/lmstudio-community/Llama-3.3-70B-Instruct-GGUF) model.

### Already running LM Studio? [skip-if-already-running]
Expand Down
63 changes: 63 additions & 0 deletions solutions/observability/llm-performance-matrix.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
---
mapped_pages:
- https://www.elastic.co/guide/en/observability/current/observability-llm-performance-matrix.html
applies_to:
stack: ga 9.2
serverless: ga
products:
- id: observability
---

# Large language model performance matrix

This page summarizes internal test results comparing large language models (LLMs) across {{obs-ai-assistant}} use cases. To learn more about these use cases, refer to [AI Assistant](/solutions/observability/observability-ai-assistant.md).

::::{important}
Rating legend:

**Excellent:** Highly accurate and reliable for the use case.<br>
**Great:** Strong performance with minor limitations.<br>
**Good:** Possibly adequate for many use cases but with noticeable tradeoffs.<br>
**Poor:** Significant issues; not recommended for production for the use case.

Recommended models are those rated **Excellent** or **Great** for the particular use case.
::::

## Proprietary models [_proprietary_models]

Models from third-party LLM providers.

| Provider | Model | **Alert questions** | **APM questions** | **Contextual insights** | **Documentation retrieval** | **Elasticsearch operations** | **{{esql}} generation** | **Execute connector** | **Knowledge retrieval** |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| Amazon Bedrock | **Claude Sonnet 3.5** | Excellent | Excellent | Excellent | Excellent | Excellent | Great | Good | Excellent |
| Amazon Bedrock | **Claude Sonnet 3.7** | Excellent | Excellent | Excellent | Excellent | Excellent | Great | Great | Excellent |
| Amazon Bedrock | **Claude Sonnet 4** | Excellent | Excellent | Excellent | Excellent | Excellent | Excellent | Great | Excellent |
| OpenAI | **GPT-4.1** | Excellent | Excellent | Excellent | Excellent | Excellent | Great | Good | Excellent |
| Google Gemini | **Gemini 2.0 Flash** | Excellent | Good | Excellent | Excellent | Excellent | Good | Good | Excellent |
| Google Gemini | **Gemini 2.5 Flash** | Excellent | Good | Excellent | Excellent | Excellent | Good | Good | Excellent |
| Google Gemini | **Gemini 2.5 Pro** | Excellent | Great | Excellent | Excellent | Excellent | Good | Good | Excellent |


## Open-source models [_open_source_models]

```{applies_to}
stack: preview 9.2
serverless: preview
```

Models you can [deploy and manage yourself](/solutions/observability/connect-to-own-local-llm.md).

| Provider | Model | **Alert questions** | **APM questions** | **Contextual insights** | **Documentation retrieval** | **Elasticsearch operations** | **{{esql}} generation** | **Execute connector** | **Knowledge retrieval** |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| Meta | **Llama-3.3-70B-Instruct** | Excellent | Good | Great | Excellent | Excellent | Good | Good | Excellent |
| Mistral | **Mistral-Small-3.2-24B-Instruct-2506** | Excellent | Poor | Great | Great | Excellent | Poor | Good | Excellent |

::::{note}
`Llama-3.3-70B-Instruct` is supported with simulated function calling.
::::

## Evaluate your own model

You can run the {{obs-ai-assistant}} evaluation framework against any model, and use it to benchmark a custom or self-hosted model against the use cases in the matrix. Refer to the [evaluation framework README](https://github.com/elastic/kibana/blob/main/x-pack/solutions/observability/plugins/observability_ai_assistant_app/scripts/evaluation/README.md) for setup and usage details.

For consistency, all ratings in this matrix were generated using `Gemini 2.5 Pro` as the judge model (specified via the `--evaluateWith` flag). Use the same judge when evaluating your own model to ensure comparable results.
5 changes: 5 additions & 0 deletions solutions/observability/observability-ai-assistant.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,11 @@ The AI Assistant connects to one of these supported LLM providers:
- The provider's API endpoint URL
- Your authentication key or secret

::::{admonition} Recommended models
While the {{obs-ai-assistant}} is compatible with many different models, refer to the [Large language model performance matrix](/solutions/observability/llm-performance-matrix.md) to select models that perform well with your desired use cases.

::::

### Elastic Managed LLM [elastic-managed-llm-obs-ai-assistant]

:::{include} ../_snippets/elastic-managed-llm.md
Expand Down
1 change: 1 addition & 0 deletions solutions/toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -503,6 +503,7 @@ toc:
- file: observability/observability-ai-assistant.md
children:
- file: observability/connect-to-own-local-llm.md
- file: observability/llm-performance-matrix.md
- file: observability/observability-serverless-feature-tiers.md
- file: security.md
children:
Expand Down
Loading