Update LLM performance matrix for security #4500

benironside · 2026-01-04T21:51:43Z

This PR fixes #4307 by updating the LLM performance matrix for Elastic Security to reflect the latest testing. Thanks @dhru42 for your work generating the new data!

For models with one or more values of "Not recommended", I changed the "Average score" value to "N/A", because the not recommended values were skewing the data and IMO making the average scores not very meaningful. For future versions, I think it would be ideal to have numeric values for all cells, rather than "not recommended". We might also consider testing performance for Automatic Import.

Generative AI disclosure

Did you use a generative AI (GenAI) tool to assist in creating this contribution?

[ x ] Yes
No

If you answered "Yes" to the previous question, please specify the tool(s) and model(s) used (e.g., Google Gemini,

Gemini 3 web interface for reformatting Google sheets into markdown

github-actions · 2026-01-04T21:52:57Z

Vale Linting Results

Summary: 2 suggestions found

💡 Suggestions (2)

File	Line	Rule	Message
solutions/security/ai/large-language-model-performance-matrix.md	31	Elastic.Acronyms	'GPT' has no definition.
solutions/security/ai/large-language-model-performance-matrix.md	50	Elastic.Acronyms	'GPT' has no definition.

github-actions · 2026-01-04T21:53:39Z

🔍 Preview links for changed docs

nastasha-solomon

Left two minor comments. Great job on continuing to improve this page. It has a ton of super useful info for our customers!

nastasha-solomon · 2026-01-05T22:19:28Z

solutions/security/ai/large-language-model-performance-matrix.md

+Higher scores indicate better performance. A score of 100 on a task means the model met or exceeded all task-specific benchmarks. 

+Models with a score of "Not recommended" failed testing. This could be due to various issues, including context window constraints.
+::::


It could be helpful to include a brief explanation of how to interpret the average score. Maybe something general like "models that score above [this threshold] might provide better performance for AI powered features. We don't recommend using models that score below [this threshold] as they won't perform as well."

I'll ask the product team if we can provide some more guidance on this, thank you for the idea

solutions/security/ai/large-language-model-performance-matrix.md

nastasha-solomon

Left two minor comments. Great job on continuing to improve this page. It has a ton of super useful info for our customers!

solutions/security/ai/large-language-model-performance-matrix.md

Update large-language-model-performance-matrix.md

8f3a0ae

github-actions bot deployed to docs-preview January 4, 2026 21:52 View deployment

updates

bbda9ad

benironside requested a review from dhru42 January 4, 2026 22:10

github-actions bot deployed to docs-preview January 4, 2026 22:10 View deployment

Update large-language-model-performance-matrix.md

9c5a131

github-actions bot deployed to docs-preview January 4, 2026 22:16 View deployment

Update large-language-model-performance-matrix.md

5f701de

github-actions bot deployed to docs-preview January 4, 2026 23:38 View deployment

updates to latest data

377c25b

github-actions bot deployed to docs-preview January 5, 2026 17:06 View deployment

Update large-language-model-performance-matrix.md

6ac2e37

github-actions bot deployed to docs-preview January 5, 2026 17:36 View deployment

Update large-language-model-performance-matrix.md

0df4ecc

github-actions bot had a problem deploying to docs-preview January 5, 2026 18:49 Failure

Update large-language-model-performance-matrix.md

0bfaf90

github-actions bot deployed to docs-preview January 5, 2026 18:51 View deployment

Update large-language-model-performance-matrix.md

de33145

github-actions bot deployed to docs-preview January 5, 2026 18:58 View deployment

Update large-language-model-performance-matrix.md

cad840a

github-actions bot deployed to docs-preview January 5, 2026 19:06 View deployment

Adds additional row for Elastic managed LLM and updates Sonnet 4

786800d

github-actions bot deployed to docs-preview January 5, 2026 19:18 View deployment

nastasha-solomon approved these changes Jan 5, 2026

View reviewed changes

benironside commented Jan 6, 2026

View reviewed changes

solutions/security/ai/large-language-model-performance-matrix.md Outdated Show resolved Hide resolved

Update solutions/security/ai/large-language-model-performance-matrix.md

cf553d9

github-actions bot deployed to docs-preview January 6, 2026 23:38 View deployment

benironside self-assigned this Jan 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update LLM performance matrix for security #4500

Update LLM performance matrix for security #4500

benironside commented Jan 4, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Jan 4, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Jan 4, 2026 •

edited

Loading

Uh oh!

nastasha-solomon left a comment

Uh oh!

nastasha-solomon Jan 5, 2026

Uh oh!

benironside Jan 6, 2026

Uh oh!

Uh oh!

nastasha-solomon left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Update LLM performance matrix for security #4500

Are you sure you want to change the base?

Update LLM performance matrix for security #4500

Conversation

benironside commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Generative AI disclosure

Uh oh!

github-actions bot commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Vale Linting Results

Uh oh!

github-actions bot commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔍 Preview links for changed docs

Uh oh!

nastasha-solomon left a comment

Choose a reason for hiding this comment

Uh oh!

nastasha-solomon Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

benironside Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nastasha-solomon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

benironside commented Jan 4, 2026 •

edited

Loading

github-actions bot commented Jan 4, 2026 •

edited

Loading

github-actions bot commented Jan 4, 2026 •

edited

Loading