diff --git a/solutions/observability/connect-to-own-local-llm.md b/solutions/observability/connect-to-own-local-llm.md index 1df78b4340..9c8a5feb82 100644 --- a/solutions/observability/connect-to-own-local-llm.md +++ b/solutions/observability/connect-to-own-local-llm.md @@ -19,6 +19,10 @@ If your Elastic deployment is not on the same network, you must configure an Ngi You do not have to set up a proxy if LM Studio is running locally, or on the same network as your Elastic deployment. :::: +::::{note} +For information about the performance of open-source models on {{obs-ai-assistant}} tasks, refer to the [LLM performance matrix](/solutions/observability/llm-performance-matrix.md). +:::: + This example uses a server hosted in GCP to configure LM Studio with the [Llama-3.3-70B-Instruct](https://huggingface.co/lmstudio-community/Llama-3.3-70B-Instruct-GGUF) model. ### Already running LM Studio? [skip-if-already-running] diff --git a/solutions/observability/llm-performance-matrix.md b/solutions/observability/llm-performance-matrix.md new file mode 100644 index 0000000000..f9416fa376 --- /dev/null +++ b/solutions/observability/llm-performance-matrix.md @@ -0,0 +1,63 @@ +--- +mapped_pages: + - https://www.elastic.co/guide/en/observability/current/observability-llm-performance-matrix.html +applies_to: + stack: ga 9.2 + serverless: ga +products: + - id: observability +--- + +# Large language model performance matrix + +This page summarizes internal test results comparing large language models (LLMs) across {{obs-ai-assistant}} use cases. To learn more about these use cases, refer to [AI Assistant](/solutions/observability/observability-ai-assistant.md). + +::::{important} +Rating legend: + +**Excellent:** Highly accurate and reliable for the use case.
+**Great:** Strong performance with minor limitations.
+**Good:** Possibly adequate for many use cases but with noticeable tradeoffs.
+**Poor:** Significant issues; not recommended for production for the use case. + +Recommended models are those rated **Excellent** or **Great** for the particular use case. +:::: + +## Proprietary models [_proprietary_models] + +Models from third-party LLM providers. + +| Provider | Model | **Alert questions** | **APM questions** | **Contextual insights** | **Documentation retrieval** | **Elasticsearch operations** | **{{esql}} generation** | **Execute connector** | **Knowledge retrieval** | +| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | +| Amazon Bedrock | **Claude Sonnet 3.5** | Excellent | Excellent | Excellent | Excellent | Excellent | Great | Good | Excellent | +| Amazon Bedrock | **Claude Sonnet 3.7** | Excellent | Excellent | Excellent | Excellent | Excellent | Great | Great | Excellent | +| Amazon Bedrock | **Claude Sonnet 4** | Excellent | Excellent | Excellent | Excellent | Excellent | Excellent | Great | Excellent | +| OpenAI | **GPT-4.1** | Excellent | Excellent | Excellent | Excellent | Excellent | Great | Good | Excellent | +| Google Gemini | **Gemini 2.0 Flash** | Excellent | Good | Excellent | Excellent | Excellent | Good | Good | Excellent | +| Google Gemini | **Gemini 2.5 Flash** | Excellent | Good | Excellent | Excellent | Excellent | Good | Good | Excellent | +| Google Gemini | **Gemini 2.5 Pro** | Excellent | Great | Excellent | Excellent | Excellent | Good | Good | Excellent | + + +## Open-source models [_open_source_models] + +```{applies_to} +stack: preview 9.2 +serverless: preview +``` + +Models you can [deploy and manage yourself](/solutions/observability/connect-to-own-local-llm.md). + +| Provider | Model | **Alert questions** | **APM questions** | **Contextual insights** | **Documentation retrieval** | **Elasticsearch operations** | **{{esql}} generation** | **Execute connector** | **Knowledge retrieval** | +| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | +| Meta | **Llama-3.3-70B-Instruct** | Excellent | Good | Great | Excellent | Excellent | Good | Good | Excellent | +| Mistral | **Mistral-Small-3.2-24B-Instruct-2506** | Excellent | Poor | Great | Great | Excellent | Poor | Good | Excellent | + +::::{note} +`Llama-3.3-70B-Instruct` is supported with simulated function calling. +:::: + +## Evaluate your own model + +You can run the {{obs-ai-assistant}} evaluation framework against any model, and use it to benchmark a custom or self-hosted model against the use cases in the matrix. Refer to the [evaluation framework README](https://github.com/elastic/kibana/blob/main/x-pack/solutions/observability/plugins/observability_ai_assistant_app/scripts/evaluation/README.md) for setup and usage details. + +For consistency, all ratings in this matrix were generated using `Gemini 2.5 Pro` as the judge model (specified via the `--evaluateWith` flag). Use the same judge when evaluating your own model to ensure comparable results. diff --git a/solutions/observability/observability-ai-assistant.md b/solutions/observability/observability-ai-assistant.md index 0b1f92385f..3f2864bfe1 100644 --- a/solutions/observability/observability-ai-assistant.md +++ b/solutions/observability/observability-ai-assistant.md @@ -91,6 +91,11 @@ The AI Assistant connects to one of these supported LLM providers: - The provider's API endpoint URL - Your authentication key or secret +::::{admonition} Recommended models +While the {{obs-ai-assistant}} is compatible with many different models, refer to the [Large language model performance matrix](/solutions/observability/llm-performance-matrix.md) to select models that perform well with your desired use cases. + +:::: + ### Elastic Managed LLM [elastic-managed-llm-obs-ai-assistant] :::{include} ../_snippets/elastic-managed-llm.md diff --git a/solutions/toc.yml b/solutions/toc.yml index 311bdd4f0f..c0464bb763 100644 --- a/solutions/toc.yml +++ b/solutions/toc.yml @@ -503,6 +503,7 @@ toc: - file: observability/observability-ai-assistant.md children: - file: observability/connect-to-own-local-llm.md + - file: observability/llm-performance-matrix.md - file: observability/observability-serverless-feature-tiers.md - file: security.md children: