-
Notifications
You must be signed in to change notification settings - Fork 163
[Obs AI Assistant] Add LLM performance matrix docs #2812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Obs AI Assistant] Add LLM performance matrix docs #2812
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of suggestions, let me know if you have any questions or comments.
|
||
# Large language model performance matrix | ||
|
||
_Last updated: 4 September 2025_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would probably avoid putting dates in the docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In scope and requirements of issue, there is a mention of
include “last updated” note.
@pmoust To clarify, should we include the last updated date of the performance matrix in the docs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mdbirnstiehl we'd like to have some way to indicate that that's a "version" of our current supportability levels. If not a date, and given that we're not linking it to a stack release per se, what would you recommend @mdbirnstiehl ?
If no strong opinions, I'd just keep the date.
Co-authored-by: Mike Birnstiehl <[email protected]>
Co-authored-by: Mike Birnstiehl <[email protected]>
Co-authored-by: Mike Birnstiehl <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Also @mdbirnstiehl this is another page we should consider consolidating since there's also a version for security
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks @benironside |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with a few nits
|
||
_Last updated: 15 September 2025_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_Last updated: 15 September 2025_ |
Keeping manually maintained dates isn't something we do nor advise doing in the docs, because they're considered to match the latest release, not specific dates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @florent-leborgne
Could you check @pmoust's comment here - #2812 (comment)
Is there a way to link it to a stack release if we are removing the date?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. Thanks @florent-leborgne
@pmoust are we okay with removing the date and only having the stack version we tested in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The requirement we have is to communicate the date we've evaluated the models with customers. This is important for Serverless too.
Is it okay to keep the date for this case?
We plan on updating these ratings whenever we come across a scenario in this comment.
cc: @pmoust
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it helps, I can update it to say "the evaluations were done on the 15 September 2025
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, more precise wording already sounds a bit better (even if manually maintained dates is still a bad practice in technical docs 😄). Something like this maybe?
Last LLM performance evaluation: 15 September 2025
- An FYI: We may surface automatically a "last updated" information on each docs page at some point, but we're very cautious with this wording (ex: fixing a typo or adding a new row doesn't necessarily mean that the entirety of the page was checked, validated, and updated at that date).
- Right now this is all new content, but if the results of these tests can vary per Stack version (or serverless) in the future, we'll have to think about how to present the information (one tab per version maybe or something similar), and about where to locate that date, to make sure it's attached to the right content/version on the page.
- Including such information at the start of the page implies that the entire matrix (all models) is checked and updated, not just a new model tested or so. If you're instead planning cases where only a small part of what shows in this doc will be tested, we may want to surface this date more granularly.
- If for some reason scenarios requiring to update this page become less frequent at some point, consider what to do with this date to avoid a "blog" effect, meaning a page written sometime in the past and becoming a liability because it's showing old dates.
Happy to discuss this further if you'd like to anticipate further updates, lmk :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@florent-leborgne @viduni94 to unblock the discussion here, I am ok to back away from having a "Last updated" date.
Let's remove the date, and continue the discussion outside of this github issue.
We shouldn't block merging on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @pmoust and @florent-leborgne
I'll remove the date for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Closes https://github.com/elastic/obs-ai-assistant-team/issues/347
Closes elastic/kibana#233110
This PR adds the LLM performance matrix and a link to the evaluation framework readme in the Observability AI Assistant docs.
The scores that were used to calculate the ratings are attached in the first issue linked above.