Skip to content

Commit 7386257

Browse files
authored
[Obs AI Assistant] Add Qwen and Claude Sonnet 4.5 ratings to the LLM performance matrix (#3056)
Closes elastic/docs-content-internal#349 Closes elastic/kibana#238610 This PR updates the Observability AI Assistant with the following changes: - Adds `Qwen 2.5 72B` to self-managed LLMs - Adds `Claude Sonnet 4.5` to proprietary models - Updates the ES|QL rating for some models (with the introduction of [unambiguous prompts for ES|QL scenarios](elastic/kibana#230774), the scores and ratings changed slightly)
1 parent fdea98f commit 7386257

File tree

1 file changed

+7
-5
lines changed

1 file changed

+7
-5
lines changed

solutions/observability/llm-performance-matrix.md

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -29,13 +29,14 @@ Models from third-party LLM providers.
2929

3030
| Provider | Model | **Alert questions** | **APM questions** | **Contextual insights** | **Documentation retrieval** | **Elasticsearch operations** | **{{esql}} generation** | **Execute connector** | **Knowledge retrieval** |
3131
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
32-
| Amazon Bedrock | **Claude Sonnet 3.5** | Excellent | Excellent | Excellent | Excellent | Excellent | Great | Good | Excellent |
33-
| Amazon Bedrock | **Claude Sonnet 3.7** | Excellent | Excellent | Excellent | Excellent | Excellent | Great | Great | Excellent |
32+
| Amazon Bedrock | **Claude Sonnet 3.5** | Excellent | Excellent | Excellent | Excellent | Excellent | Excellent | Good | Excellent |
33+
| Amazon Bedrock | **Claude Sonnet 3.7** | Excellent | Excellent | Excellent | Excellent | Excellent | Excellent | Great | Excellent |
3434
| Amazon Bedrock | **Claude Sonnet 4** | Excellent | Excellent | Excellent | Excellent | Excellent | Excellent | Great | Excellent |
35+
| Amazon Bedrock | **Claude Sonnet 4.5** | Excellent | Excellent | Excellent | Excellent | Excellent | Excellent | Good | Excellent |
3536
| OpenAI | **GPT-4.1** | Excellent | Excellent | Excellent | Excellent | Excellent | Great | Good | Excellent |
3637
| Google Gemini | **Gemini 2.0 Flash** | Excellent | Good | Excellent | Excellent | Excellent | Good | Good | Excellent |
37-
| Google Gemini | **Gemini 2.5 Flash** | Excellent | Good | Excellent | Excellent | Excellent | Good | Good | Excellent |
38-
| Google Gemini | **Gemini 2.5 Pro** | Excellent | Great | Excellent | Excellent | Excellent | Good | Good | Excellent |
38+
| Google Gemini | **Gemini 2.5 Flash** | Excellent | Good | Excellent | Excellent | Excellent | Great | Good | Excellent |
39+
| Google Gemini | **Gemini 2.5 Pro** | Excellent | Great | Excellent | Excellent | Excellent | Great | Good | Excellent |
3940

4041

4142
## Open-source models [_open_source_models]
@@ -51,9 +52,10 @@ Models you can [deploy and manage yourself](/solutions/observability/connect-to-
5152
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
5253
| Meta | **Llama-3.3-70B-Instruct** | Excellent | Good | Great | Excellent | Excellent | Good | Good | Excellent |
5354
| Mistral | **Mistral-Small-3.2-24B-Instruct-2506** | Excellent | Poor | Great | Great | Excellent | Poor | Good | Excellent |
55+
| Alibaba Cloud | **Qwen2.5-72b-Instruct** | Excellent | Good | Great | Excellent | Excellent | Good | Good | Excellent |
5456

5557
::::{note}
56-
`Llama-3.3-70B-Instruct` is supported with simulated function calling.
58+
`Llama-3.3-70B-Instruct` and `Qwen2.5-72b-Instruct` were tested with simulated function calling.
5759
::::
5860

5961
## Evaluate your own model

0 commit comments

Comments
 (0)