[Obs AI Assistant] Add Qwen and Claude Sonnet 4.5 ratings to the LLM performance matrix (#3056)

viduni94 · web-flow · commit 7386257892be · 2025-10-23T12:34:17.000-04:00
Closes elastic/docs-content-internal#349 Closes elastic/kibana#238610 This PR updates the Observability AI Assistant with the following changes: - Adds `Qwen 2.5 72B` to self-managed LLMs - Adds `Claude Sonnet 4.5` to proprietary models - Updates the ES|QL rating for some models (with the introduction of [unambiguous prompts for ES|QL scenarios](elastic/kibana#230774), the scores and ratings changed slightly)
diff --git a/solutions/observability/llm-performance-matrix.md b/solutions/observability/llm-performance-matrix.md
@@ -29,13 +29,14 @@ Models from third-party LLM providers.
 
 | Provider | Model | **Alert questions** | **APM questions** | **Contextual insights** | **Documentation retrieval** | **Elasticsearch operations** | **{{esql}} generation** | **Execute connector** | **Knowledge retrieval** |
 | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
-| Amazon Bedrock | **Claude Sonnet 3.5** | Excellent | Excellent | Excellent | Excellent | Excellent | Great | Good | Excellent |
-| Amazon Bedrock | **Claude Sonnet 3.7** | Excellent | Excellent | Excellent | Excellent | Excellent | Great | Great | Excellent |
+| Amazon Bedrock | **Claude Sonnet 3.5** | Excellent | Excellent | Excellent | Excellent | Excellent | Excellent | Good | Excellent |
+| Amazon Bedrock | **Claude Sonnet 3.7** | Excellent | Excellent | Excellent | Excellent | Excellent | Excellent | Great | Excellent |
 | Amazon Bedrock | **Claude Sonnet 4**   | Excellent | Excellent | Excellent | Excellent | Excellent | Excellent | Great | Excellent |
+| Amazon Bedrock | **Claude Sonnet 4.5**   | Excellent | Excellent | Excellent | Excellent | Excellent | Excellent | Good | Excellent |
 | OpenAI    | **GPT-4.1**           | Excellent | Excellent | Excellent | Excellent | Excellent | Great | Good | Excellent |
 | Google Gemini    | **Gemini 2.0 Flash**    | Excellent | Good | Excellent | Excellent | Excellent | Good | Good | Excellent |
-| Google Gemini    | **Gemini 2.5 Flash**    | Excellent | Good | Excellent | Excellent | Excellent | Good | Good | Excellent |
-| Google Gemini    | **Gemini 2.5 Pro**    | Excellent | Great | Excellent | Excellent | Excellent | Good | Good | Excellent |
+| Google Gemini    | **Gemini 2.5 Flash**    | Excellent | Good | Excellent | Excellent | Excellent | Great | Good | Excellent |
+| Google Gemini    | **Gemini 2.5 Pro**    | Excellent | Great | Excellent | Excellent | Excellent | Great | Good | Excellent |
 
 
 ## Open-source models [_open_source_models]
@@ -51,9 +52,10 @@ Models you can [deploy and manage yourself](/solutions/observability/connect-to-
 | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
 | Meta | **Llama-3.3-70B-Instruct** | Excellent | Good | Great | Excellent | Excellent | Good | Good | Excellent |
 | Mistral | **Mistral-Small-3.2-24B-Instruct-2506** | Excellent | Poor | Great | Great | Excellent | Poor | Good | Excellent |
+| Alibaba Cloud | **Qwen2.5-72b-Instruct** | Excellent | Good | Great | Excellent | Excellent | Good | Good | Excellent |
 
 ::::{note}
-`Llama-3.3-70B-Instruct` is supported with simulated function calling.
+`Llama-3.3-70B-Instruct` and `Qwen2.5-72b-Instruct` were tested with simulated function calling.
 ::::
 
 ## Evaluate your own model