Skip to content

Conversation

viduni94
Copy link
Contributor

@viduni94 viduni94 commented Sep 4, 2025

Closes https://github.com/elastic/obs-ai-assistant-team/issues/347
Closes elastic/kibana#233110

This PR adds the LLM performance matrix and a link to the evaluation framework readme in the Observability AI Assistant docs.

The scores that were used to calculate the ratings are attached in the first issue linked above.

@viduni94 viduni94 requested a review from a team as a code owner September 4, 2025 15:32
Copy link

github-actions bot commented Sep 4, 2025

@viduni94 viduni94 added documentation Improvements or additions to documentation v9.2.0 labels Sep 4, 2025
Copy link
Contributor

@mdbirnstiehl mdbirnstiehl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of suggestions, let me know if you have any questions or comments.


# Large language model performance matrix

_Last updated: 4 September 2025_
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably avoid putting dates in the docs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In scope and requirements of issue, there is a mention of

include “last updated” note.

@pmoust To clarify, should we include the last updated date of the performance matrix in the docs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mdbirnstiehl we'd like to have some way to indicate that that's a "version" of our current supportability levels. If not a date, and given that we're not linking it to a stack release per se, what would you recommend @mdbirnstiehl ?

If no strong opinions, I'd just keep the date.

Co-authored-by: Mike Birnstiehl <[email protected]>
viduni94 and others added 2 commits September 5, 2025 16:36
Copy link
Contributor

@benironside benironside left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Also @mdbirnstiehl this is another page we should consider consolidating since there's also a version for security

Copy link

@arturoliduena arturoliduena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@viduni94
Copy link
Contributor Author

LGTM. Also @mdbirnstiehl this is another page we should consider consolidating since there's also a version for security

Thanks @benironside
Would you be able to add a review to the PR if everything looks okay as Mike is on PTO?
Let me know if anything needs to change, happy to update.

Copy link
Contributor

@florent-leborgne florent-leborgne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a few nits

Comment on lines 12 to 13

_Last updated: 15 September 2025_
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_Last updated: 15 September 2025_

Keeping manually maintained dates isn't something we do nor advise doing in the docs, because they're considered to match the latest release, not specific dates.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @florent-leborgne
Could you check @pmoust's comment here - #2812 (comment)

Is there a way to link it to a stack release if we are removing the date?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole page will be marked as 9.2 thanks to the frontmatter
image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. Thanks @florent-leborgne

@pmoust are we okay with removing the date and only having the stack version we tested in?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'll show like this when 9.2 is officially released
image

Copy link
Contributor Author

@viduni94 viduni94 Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@florent-leborgne

The requirement we have is to communicate the date we've evaluated the models with customers. This is important for Serverless too.
Is it okay to keep the date for this case?

We plan on updating these ratings whenever we come across a scenario in this comment.

cc: @pmoust

Copy link
Contributor Author

@viduni94 viduni94 Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it helps, I can update it to say "the evaluations were done on the 15 September 2025
What do you think?

Copy link
Contributor

@florent-leborgne florent-leborgne Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, more precise wording already sounds a bit better (even if manually maintained dates is still a bad practice in technical docs 😄). Something like this maybe?

Last LLM performance evaluation: 15 September 2025

  • An FYI: We may surface automatically a "last updated" information on each docs page at some point, but we're very cautious with this wording (ex: fixing a typo or adding a new row doesn't necessarily mean that the entirety of the page was checked, validated, and updated at that date).
  • Right now this is all new content, but if the results of these tests can vary per Stack version (or serverless) in the future, we'll have to think about how to present the information (one tab per version maybe or something similar), and about where to locate that date, to make sure it's attached to the right content/version on the page.
  • Including such information at the start of the page implies that the entire matrix (all models) is checked and updated, not just a new model tested or so. If you're instead planning cases where only a small part of what shows in this doc will be tested, we may want to surface this date more granularly.
  • If for some reason scenarios requiring to update this page become less frequent at some point, consider what to do with this date to avoid a "blog" effect, meaning a page written sometime in the past and becoming a liability because it's showing old dates.

Happy to discuss this further if you'd like to anticipate further updates, lmk :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@florent-leborgne @viduni94 to unblock the discussion here, I am ok to back away from having a "Last updated" date.
Let's remove the date, and continue the discussion outside of this github issue.
We shouldn't block merging on that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pmoust and @florent-leborgne
I'll remove the date for now.

Copy link
Contributor

@florent-leborgne florent-leborgne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@viduni94 viduni94 changed the title [Do not merge] [Obs AI Assistant] Add LLM performance matrix docs [Obs AI Assistant] Add LLM performance matrix docs Sep 17, 2025
@viduni94 viduni94 merged commit 04ab734 into elastic:main Sep 18, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Obs AI Assistant] Document Evaluation Framework in Product doc

8 participants