Skip to content
Merged
Show file tree
Hide file tree
Changes from 24 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
135f57c
Add initial docs for performance matrix
viduni94 Sep 4, 2025
ce47b33
Add link to the evaluation framework
viduni94 Sep 4, 2025
d462cde
Add new docs to toc
viduni94 Sep 4, 2025
2f75666
Update formatting
viduni94 Sep 4, 2025
fb00b1d
Add Mistral
viduni94 Sep 4, 2025
81f1d2a
Update wording
viduni94 Sep 4, 2025
8de6028
Update legend
viduni94 Sep 4, 2025
3f45db8
Update legend format
viduni94 Sep 4, 2025
fc8d256
Update columns
viduni94 Sep 4, 2025
2020139
Update rating for proprietary models
viduni94 Sep 4, 2025
ea837bb
Remove rating for local models
viduni94 Sep 4, 2025
0f180bf
Update llama and mistral small scores
viduni94 Sep 5, 2025
a4d1397
Update mistral small es|ql rating
viduni94 Sep 5, 2025
c641bee
Update llama model name in note
viduni94 Sep 5, 2025
b5bfd40
Fix typo
viduni94 Sep 5, 2025
2ed08b5
Update llm-performance-matrix.md
viduni94 Sep 5, 2025
b4f3816
Update connect-to-own-local-llm.md
viduni94 Sep 5, 2025
0ebcaf2
Update llm-performance-matrix.md
viduni94 Sep 5, 2025
95dad45
Add judge model
viduni94 Sep 10, 2025
ebb7d47
Merge branch 'main' into obs-ai-assistant-llm-performance-matrix
viduni94 Sep 10, 2025
86da386
Update ratings
viduni94 Sep 12, 2025
efba9b5
Update ratings to the new scale
viduni94 Sep 15, 2025
9c991fc
Update date
viduni94 Sep 15, 2025
cb7e64e
Merge branch 'main' into obs-ai-assistant-llm-performance-matrix
viduni94 Sep 15, 2025
2e737ae
Update solutions/observability/llm-performance-matrix.md
viduni94 Sep 16, 2025
4c2579f
Merge branch 'main' into obs-ai-assistant-llm-performance-matrix
florent-leborgne Sep 16, 2025
9ecce3d
Address review comments
viduni94 Sep 16, 2025
8e093b9
Remove date
viduni94 Sep 17, 2025
39d5c1b
Merge branch 'main' into obs-ai-assistant-llm-performance-matrix
florent-leborgne Sep 17, 2025
4ee68f0
Merge branch 'main' into obs-ai-assistant-llm-performance-matrix
viduni94 Sep 17, 2025
24ab5c0
Merge branch 'main' into obs-ai-assistant-llm-performance-matrix
viduni94 Sep 17, 2025
50ef7b2
Merge branch 'main' into obs-ai-assistant-llm-performance-matrix
viduni94 Sep 18, 2025
4a47c0c
Merge branch 'main' into obs-ai-assistant-llm-performance-matrix
viduni94 Sep 18, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions solutions/observability/connect-to-own-local-llm.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,10 @@ If your Elastic deployment is not on the same network, you must configure an Ngi
You do not have to set up a proxy if LM Studio is running locally, or on the same network as your Elastic deployment.
::::

::::{note}
For information about the performance of open-source models on {{obs-ai-assistant}} tasks, refer to the [LLM performance matrix](/solutions/observability/llm-performance-matrix.md).
::::

This example uses a server hosted in GCP to configure LM Studio with the [Llama-3.3-70B-Instruct](https://huggingface.co/lmstudio-community/Llama-3.3-70B-Instruct-GGUF) model.

### Already running LM Studio? [skip-if-already-running]
Expand Down
64 changes: 64 additions & 0 deletions solutions/observability/llm-performance-matrix.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
---
mapped_pages:
- https://www.elastic.co/guide/en/observability/current/observability-llm-performance-matrix.html
applies_to:
stack: ga 9.2
serverless: ga
products:
- id: observability
---

# Large language model performance matrix

_Last updated: 15 September 2025_
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_Last updated: 15 September 2025_

Keeping manually maintained dates isn't something we do nor advise doing in the docs, because they're considered to match the latest release, not specific dates.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @florent-leborgne
Could you check @pmoust's comment here - #2812 (comment)

Is there a way to link it to a stack release if we are removing the date?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole page will be marked as 9.2 thanks to the frontmatter
image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. Thanks @florent-leborgne

@pmoust are we okay with removing the date and only having the stack version we tested in?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'll show like this when 9.2 is officially released
image

Copy link
Contributor Author

@viduni94 viduni94 Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@florent-leborgne

The requirement we have is to communicate the date we've evaluated the models with customers. This is important for Serverless too.
Is it okay to keep the date for this case?

We plan on updating these ratings whenever we come across a scenario in this comment.

cc: @pmoust

Copy link
Contributor Author

@viduni94 viduni94 Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it helps, I can update it to say "the evaluations were done on the 15 September 2025
What do you think?

Copy link
Contributor

@florent-leborgne florent-leborgne Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, more precise wording already sounds a bit better (even if manually maintained dates is still a bad practice in technical docs 😄). Something like this maybe?

Last LLM performance evaluation: 15 September 2025

  • An FYI: We may surface automatically a "last updated" information on each docs page at some point, but we're very cautious with this wording (ex: fixing a typo or adding a new row doesn't necessarily mean that the entirety of the page was checked, validated, and updated at that date).
  • Right now this is all new content, but if the results of these tests can vary per Stack version (or serverless) in the future, we'll have to think about how to present the information (one tab per version maybe or something similar), and about where to locate that date, to make sure it's attached to the right content/version on the page.
  • Including such information at the start of the page implies that the entire matrix (all models) is checked and updated, not just a new model tested or so. If you're instead planning cases where only a small part of what shows in this doc will be tested, we may want to surface this date more granularly.
  • If for some reason scenarios requiring to update this page become less frequent at some point, consider what to do with this date to avoid a "blog" effect, meaning a page written sometime in the past and becoming a liability because it's showing old dates.

Happy to discuss this further if you'd like to anticipate further updates, lmk :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@florent-leborgne @viduni94 to unblock the discussion here, I am ok to back away from having a "Last updated" date.
Let's remove the date, and continue the discussion outside of this github issue.
We shouldn't block merging on that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pmoust and @florent-leborgne
I'll remove the date for now.


This page summarizes internal test results comparing large language models (LLMs) across {{obs-ai-assistant}} use cases. To learn more about these use cases, refer to [AI Assistant](/solutions/observability/observability-ai-assistant.md).

::::{important}
Rating legend:

**Excellent:** Highly accurate and reliable for the use case.<br>
**Great:** Strong performance with minor limitations.<br>
**Good:** Possibly adequate for many use cases but with noticeable tradeoffs.<br>
**Poor:** Significant issues; not recommended for production for the use case.

Recommended models are those rated **Excellent** or **Great** for the particular use case.
::::

## Proprietary models [_proprietary_models]

Models from third-party LLM providers.

| Provider | Model | **Alert questions** | **APM questions** | **Contextual insights** | **Documentation retrieval** | **Elasticsearch operations** | **{{esql}} generation** | **Execute connector** | **Knowledge retrieval** |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| Amazon Bedrock | **Claude Sonnet 3.5** | Excellent | Excellent | Excellent | Excellent | Excellent | Great | Good | Excellent |
| Amazon Bedrock | **Claude Sonnet 3.7** | Excellent | Excellent | Excellent | Excellent | Excellent | Great | Great | Excellent |
| Amazon Bedrock | **Claude Sonnet 4** | Excellent | Excellent | Excellent | Excellent | Excellent | Excellent | Great | Excellent |
| OpenAI | **GPT-4.1** | Excellent | Excellent | Excellent | Excellent | Excellent | Great | Good | Excellent |
| Google Gemini | **Gemini 2.0 Flash** | Excellent | Good | Excellent | Excellent | Excellent | Good | Good | Excellent |
| Google Gemini | **Gemini 2.5 Flash** | Excellent | Good | Excellent | Excellent | Excellent | Good | Good | Excellent |
| Google Gemini | **Gemini 2.5 Pro** | Excellent | Great | Excellent | Excellent | Excellent | Good | Good | Excellent |


## Open-source models [_open_source_models]

::::{warning}
This functionality is in technical preview and may be changed or removed in a future release. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features.
::::

Models you can [deploy and manage yourself](/solutions/observability/connect-to-own-local-llm.md).

| Provider | Model | **Alert questions** | **APM questions** | **Contextual insights** | **Documentation retrieval** | **Elasticsearch operations** | **{{esql}} generation** | **Execute connector** | **Knowledge retrieval** |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| Meta | **Llama-3.3-70B-Instruct** | Excellent | Good | Great | Excellent | Excellent | Good | Good | Excellent |
| Mistral | **Mistral-Small-3.2-24B-Instruct-2506** | Excellent | Poor | Great | Great | Excellent | Poor | Good | Excellent |

::::{note}
`Llama-3.3-70B-Instruct` is currently supported with simulated function calling.
::::

## Evaluate your own model

You can run the {{obs-ai-assistant}} evaluation framework against any model, and use it to benchmark a custom or self-hosted model against the use cases in the matrix. Refer to the [evaluation framework README](https://github.com/elastic/kibana/blob/main/x-pack/solutions/observability/plugins/observability_ai_assistant_app/scripts/evaluation/README.md) for setup and usage details.

For consistency, all ratings in this matrix were generated using `Gemini 2.5 Pro` as the judge model (specified via the `--evaluateWith` flag). Use the same judge when evaluating your own model to ensure comparable results.
5 changes: 5 additions & 0 deletions solutions/observability/observability-ai-assistant.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,11 @@ The AI Assistant connects to one of these supported LLM providers:
- The provider's API endpoint URL
- Your authentication key or secret

::::{admonition} Recommended models
While the AI Assistant is compatible with many different models, refer to the [Large language model performance matrix](/solutions/observability/llm-performance-matrix.md) to select models that perform well with your desired use cases.

::::

### Elastic Managed LLM [elastic-managed-llm-obs-ai-assistant]

:::{include} ../_snippets/elastic-managed-llm.md
Expand Down
1 change: 1 addition & 0 deletions solutions/toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -465,6 +465,7 @@ toc:
- file: observability/observability-ai-assistant.md
children:
- file: observability/connect-to-own-local-llm.md
- file: observability/llm-performance-matrix.md
- file: observability/observability-serverless-feature-tiers.md
- file: security.md
children:
Expand Down