Skip to content

Conversation

alaudazzi
Copy link
Contributor

@alaudazzi alaudazzi commented Mar 14, 2025

This PR adds a new landing page on LLM Observability.

Doc preview.

Closes https://github.com/elastic/observability-docs/issues/4837

@alaudazzi alaudazzi added the documentation Improvements or additions to documentation label Mar 14, 2025
@alaudazzi alaudazzi self-assigned this Mar 14, 2025
@alaudazzi alaudazzi marked this pull request as ready for review March 14, 2025 17:59
@alaudazzi
Copy link
Contributor Author

@daniela-elastic I ported the two tables from the https://github.com/elastic/genai-instrumentation/edit/main/docs/inventory.md page, let me know if this looks as you expected.

Copy link
Contributor

@hegerchr hegerchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alaudazzi thank you for the draft. It's nice to read and has a good flow while reading 👍

Elastic’s end-to-end LLM observability is delivered through the following methods:

- Metrics and logs ingestion for LLM APIs (via [Elastic integrations](https://www.elastic.co/guide/en/integrations/current/introduction.html))
- APM tracing for OpenAI Models (via [instrumentation](https://github.com/elastic/opentelemetry))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to be more specific here in terms of metrics and logs can also be collected by the agent? IIUC we get metrics and logs from LLM APIs about what is happening on the LLM vendor side. APM tracing, metrics and logs is about what is happening in the application making use of LLMs.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very good question. I'm still in two minds, for example if we show tokens for both tracing and LLM API-based integration, can we still say that the tokens used is vendor-side metric (for integrations) and also application-side metric. Similar with duration - we show the duration (latency) for both integrations and instrumentation. Maybe we can distinguish between the two in terms of how granular / zoomed in the information is. For example, in the integration you can see the sum total of all tokens used (per model) regardless of which application used them. In fact you don't even need to instrument applications to get these metrics. On the other hand, the tokens you get from instrumentation are for the specific request giving you a more zoomed in data. The two answer different questions: 1) what is my total number of tokens per model for this API key and how does it change over time and 2) how many tokens did I use for this request in this application

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@daniela-elastic @hegerchr
Is this something to keep in mind for further iterations of this page?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say so. @daniela-elastic's comment triggered two questions in my head. I'm not familiar with the details and I'm currently wondering if

  • we're capturing the tokens in tracing,
  • the metrics are distinguishable by name,
  • we have any demo running where I could have a look on the data, and
  • if it should be part of the OTel demo

Copy link

@daniela-elastic daniela-elastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Arianna, this is a solid first draft. We can iterate on it as we clarify our support story as well as add overall steps for how to provide end to end LLM observability (combo of integrations and instrumentation) or pick just one depending on the desired use case coverage. Just as an FYI: we are 95% likely to be able to support instrumentation for models on Google Vertex AI (we get this for free from upstream). And we already have instrumentation support for models hosted on Amazon Bedrock. I've raised a PR to update the support matrix in the inventory pae in genAI instrumentation repo which will hopefully go through today.

Copy link

@akhileshpok akhileshpok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The content is well structured and concise! Thanks

@alaudazzi
Copy link
Contributor Author

@daniela-elastic
I integrated your latest feedback from our conversation this morning. Thanks for your thorough review!

@alaudazzi
Copy link
Contributor Author

@hegerchr @daniela-elastic
I updated the link to the Quick start. Would you please check and make sure it's the correct one?

@alaudazzi
Copy link
Contributor Author

alaudazzi commented Mar 31, 2025

@daniela-elastic
As per our Slack conversation, these doc updates are still required before merging:

  • - remove [OpenTelemetry][int-wip-otel] from the Integrations table
  • - remove links from above the Integrations table and add them to the table
  • - remove the Notes column from both tables
  • - remove status from the instrumentations table
  • - remove Langchain from the instrumentations table
  • - rename Google Cloud AI Platform => Google Vertex AI
  • - put links in the Instrumented Dependency table
  • - check with Christophe for the correct links to the Elastic Distributions of OpenTelemetry

@alaudazzi
Copy link
Contributor Author

@hegerchr @xrmx I addressed your comments. Would you mind having another look and check if that's OK with you?

@alaudazzi alaudazzi merged commit 5065197 into main Apr 3, 2025
5 checks passed
@alaudazzi alaudazzi deleted the llm-observability branch April 3, 2025 09:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[LLM Observability] Add new content on integrations and instrumentation

6 participants