-
Notifications
You must be signed in to change notification settings - Fork 195
Update LLM performance matrix for security #4500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Vale Linting ResultsSummary: 2 suggestions found |
🔍 Preview links for changed docs |
nastasha-solomon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left two minor comments. Great job on continuing to improve this page. It has a ton of super useful info for our customers!
| Higher scores indicate better performance. A score of 100 on a task means the model met or exceeded all task-specific benchmarks. | ||
|
|
||
| Models with a score of "Not recommended" failed testing. This could be due to various issues, including context window constraints. | ||
| :::: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be helpful to include a brief explanation of how to interpret the average score. Maybe something general like "models that score above [this threshold] might provide better performance for AI powered features. We don't recommend using models that score below [this threshold] as they won't perform as well."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll ask the product team if we can provide some more guidance on this, thank you for the idea
solutions/security/ai/large-language-model-performance-matrix.md
Outdated
Show resolved
Hide resolved
nastasha-solomon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left two minor comments. Great job on continuing to improve this page. It has a ton of super useful info for our customers!
This PR fixes #4307 by updating the LLM performance matrix for Elastic Security to reflect the latest testing. Thanks @dhru42 for your work generating the new data!
For models with one or more values of "Not recommended", I changed the "Average score" value to "N/A", because the not recommended values were skewing the data and IMO making the average scores not very meaningful. For future versions, I think it would be ideal to have numeric values for all cells, rather than "not recommended". We might also consider testing performance for Automatic Import.
Generative AI disclosure