Skip to content

Commit 04ab734

Browse files
viduni94mdbirnstiehlflorent-leborgne
authored
[Obs AI Assistant] Add LLM performance matrix docs (#2812)
Closes elastic/obs-ai-team#347 Closes elastic/kibana#233110 This PR adds the LLM performance matrix and a link to the evaluation framework readme in the Observability AI Assistant docs. The scores that were used to calculate the ratings are attached in the first issue [linked above](elastic/obs-ai-team#347 (comment)). --------- Co-authored-by: Mike Birnstiehl <[email protected]> Co-authored-by: florent-leborgne <[email protected]>
1 parent d4039c5 commit 04ab734

File tree

4 files changed

+73
-0
lines changed

4 files changed

+73
-0
lines changed

solutions/observability/connect-to-own-local-llm.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,10 @@ If your Elastic deployment is not on the same network, you must configure an Ngi
1919
You do not have to set up a proxy if LM Studio is running locally, or on the same network as your Elastic deployment.
2020
::::
2121

22+
::::{note}
23+
For information about the performance of open-source models on {{obs-ai-assistant}} tasks, refer to the [LLM performance matrix](/solutions/observability/llm-performance-matrix.md).
24+
::::
25+
2226
This example uses a server hosted in GCP to configure LM Studio with the [Llama-3.3-70B-Instruct](https://huggingface.co/lmstudio-community/Llama-3.3-70B-Instruct-GGUF) model.
2327

2428
### Already running LM Studio? [skip-if-already-running]
Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
---
2+
mapped_pages:
3+
- https://www.elastic.co/guide/en/observability/current/observability-llm-performance-matrix.html
4+
applies_to:
5+
stack: ga 9.2
6+
serverless: ga
7+
products:
8+
- id: observability
9+
---
10+
11+
# Large language model performance matrix
12+
13+
This page summarizes internal test results comparing large language models (LLMs) across {{obs-ai-assistant}} use cases. To learn more about these use cases, refer to [AI Assistant](/solutions/observability/observability-ai-assistant.md).
14+
15+
::::{important}
16+
Rating legend:
17+
18+
**Excellent:** Highly accurate and reliable for the use case.<br>
19+
**Great:** Strong performance with minor limitations.<br>
20+
**Good:** Possibly adequate for many use cases but with noticeable tradeoffs.<br>
21+
**Poor:** Significant issues; not recommended for production for the use case.
22+
23+
Recommended models are those rated **Excellent** or **Great** for the particular use case.
24+
::::
25+
26+
## Proprietary models [_proprietary_models]
27+
28+
Models from third-party LLM providers.
29+
30+
| Provider | Model | **Alert questions** | **APM questions** | **Contextual insights** | **Documentation retrieval** | **Elasticsearch operations** | **{{esql}} generation** | **Execute connector** | **Knowledge retrieval** |
31+
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
32+
| Amazon Bedrock | **Claude Sonnet 3.5** | Excellent | Excellent | Excellent | Excellent | Excellent | Great | Good | Excellent |
33+
| Amazon Bedrock | **Claude Sonnet 3.7** | Excellent | Excellent | Excellent | Excellent | Excellent | Great | Great | Excellent |
34+
| Amazon Bedrock | **Claude Sonnet 4** | Excellent | Excellent | Excellent | Excellent | Excellent | Excellent | Great | Excellent |
35+
| OpenAI | **GPT-4.1** | Excellent | Excellent | Excellent | Excellent | Excellent | Great | Good | Excellent |
36+
| Google Gemini | **Gemini 2.0 Flash** | Excellent | Good | Excellent | Excellent | Excellent | Good | Good | Excellent |
37+
| Google Gemini | **Gemini 2.5 Flash** | Excellent | Good | Excellent | Excellent | Excellent | Good | Good | Excellent |
38+
| Google Gemini | **Gemini 2.5 Pro** | Excellent | Great | Excellent | Excellent | Excellent | Good | Good | Excellent |
39+
40+
41+
## Open-source models [_open_source_models]
42+
43+
```{applies_to}
44+
stack: preview 9.2
45+
serverless: preview
46+
```
47+
48+
Models you can [deploy and manage yourself](/solutions/observability/connect-to-own-local-llm.md).
49+
50+
| Provider | Model | **Alert questions** | **APM questions** | **Contextual insights** | **Documentation retrieval** | **Elasticsearch operations** | **{{esql}} generation** | **Execute connector** | **Knowledge retrieval** |
51+
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
52+
| Meta | **Llama-3.3-70B-Instruct** | Excellent | Good | Great | Excellent | Excellent | Good | Good | Excellent |
53+
| Mistral | **Mistral-Small-3.2-24B-Instruct-2506** | Excellent | Poor | Great | Great | Excellent | Poor | Good | Excellent |
54+
55+
::::{note}
56+
`Llama-3.3-70B-Instruct` is supported with simulated function calling.
57+
::::
58+
59+
## Evaluate your own model
60+
61+
You can run the {{obs-ai-assistant}} evaluation framework against any model, and use it to benchmark a custom or self-hosted model against the use cases in the matrix. Refer to the [evaluation framework README](https://github.com/elastic/kibana/blob/main/x-pack/solutions/observability/plugins/observability_ai_assistant_app/scripts/evaluation/README.md) for setup and usage details.
62+
63+
For consistency, all ratings in this matrix were generated using `Gemini 2.5 Pro` as the judge model (specified via the `--evaluateWith` flag). Use the same judge when evaluating your own model to ensure comparable results.

solutions/observability/observability-ai-assistant.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,11 @@ The AI Assistant connects to one of these supported LLM providers:
9191
- The provider's API endpoint URL
9292
- Your authentication key or secret
9393

94+
::::{admonition} Recommended models
95+
While the {{obs-ai-assistant}} is compatible with many different models, refer to the [Large language model performance matrix](/solutions/observability/llm-performance-matrix.md) to select models that perform well with your desired use cases.
96+
97+
::::
98+
9499
### Elastic Managed LLM [elastic-managed-llm-obs-ai-assistant]
95100

96101
:::{include} ../_snippets/elastic-managed-llm.md

solutions/toc.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -503,6 +503,7 @@ toc:
503503
- file: observability/observability-ai-assistant.md
504504
children:
505505
- file: observability/connect-to-own-local-llm.md
506+
- file: observability/llm-performance-matrix.md
506507
- file: observability/observability-serverless-feature-tiers.md
507508
- file: security.md
508509
children:

0 commit comments

Comments
 (0)