-
Notifications
You must be signed in to change notification settings - Fork 157
[LLM Observability] Add new landing page #782
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 11 commits
ee50689
d9c6011
6149699
746261f
bedaf9f
8f64355
df0544e
c2f23af
e4160d3
c10ad22
399a4ca
0162235
fc7cb75
f9f41eb
e001d45
6511d42
d676101
f3f9d96
c684a86
d5a83b5
47a0230
94770d0
873f16c
252433d
08b482b
0e9f4f9
d268c0d
c07b0a4
88c5646
b63caed
b38951a
a26136b
13bbc39
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,98 @@ | ||
--- | ||
navigation_title: "LLM Observability" | ||
alaudazzi marked this conversation as resolved.
Show resolved
Hide resolved
|
||
--- | ||
|
||
# LLM Observability | ||
|
||
While LLMs hold incredible transformative potential, they also bring complex challenges in reliability, performance, and cost management. Traditional monitoring tools require an evolved set of observability capabilities to ensure these models operate efficiently and effectively. | ||
To keep your LLM-powered applications reliable, efficient, cost-effective, and easy to troubleshoot, Elastic provides a powerful LLM observability framework including key metrics, logs, and traces, along with pre-configured, out-of-the-box dashboards that deliver deep insights into model prompts and responses, performance, usage, and costs. | ||
Elastic’s end-to-end LLM observability is delivered through the following methods: | ||
|
||
- Metrics and logs ingestion for LLM APIs (via [Elastic integrations](https://www.elastic.co/guide/en/integrations/current/introduction.html)) | ||
- APM tracing for OpenAI Models (via [instrumentation](https://github.com/elastic/opentelemetry)) | ||
|
||
|
||
## Metrics and logs ingestion for LLM APIs (via Elastic integrations) | ||
|
||
Elastic’s LLM integrations now support the most widely adopted models, including OpenAI, Azure OpenAI, and a diverse range of models hosted on Amazon Bedrock and Google Vertex AI: | ||
|
||
- [Amazon Bedrock](https://www.elastic.co/guide/en/integrations/current/aws_bedrock.html) | ||
- [Azure OpenAI](https://www.elastic.co/guide/en/integrations/current/azure_openai.html) | ||
- [GCP Vertex AI](https://www.elastic.co/guide/en/integrations/current/gcp_vertexai.html) | ||
- [OpenAI](https://www.elastic.co/guide/en/integrations/current/openai.html) | ||
|
||
Depending on the LLM provider you choose, the following table shows which source you can use and which type of data -- log or metrics -- you can collect. | ||
|
||
| **LLM Provider** | **Source** | **Metrics** | **Logs** | **Notes** | | ||
|--------|------------|------------| | ||
| [AWS Bedrock][int-bedrock]| [AWS CloudWatch Logs][impl-bedrock] | ✅ | ✅ | GA | | ||
| [Azure OpenAI][int-azure]| [Azure Monitor and Event Hubs][impl-azure] | ✅ | ✅ | GA | | ||
| [GCP Vertex AI][int-vertexai] | [GCP Cloud Monitoring][impl-vertexai] | ✅ | 🚧 | GA, we are not able to collect meaningful information for request/response from logs due to dynamic generation, GCP are aware of this issue, not ETA yet | | ||
| [OpenAI][int-openai]| [OpenAI Usage API][openai-usage] | ✅| 🚧 | GA, cannot collect prompt/response logs until OpenAI provides support. | | ||
| [OpenTelemetry][int-wip-otel] | OTLP | 🚧 | 🚧 | This would support Elastic extensions of otel's GenAI semantic conventions | | ||
|
||
|
||
## APM tracing for OpenAI models (via instrumentation) | ||
|
||
Elastic offers specialized OpenTelemetry Protocol (OTLP) tracing for applications leveraging OpeAI models hosted on OpenAI, Azure, and Amazon Bedrock, providing a detailed view of request flows. This tracing capability captures critical insights, including the specific models used, request duration, errors encountered, token consumption per request, and the interaction between prompts and responses. Ideal for troubleshooting, APM tracing allows you to find exactly where the issue is happening with precision and efficiency in your OpenAI-powered application. | ||
alaudazzi marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
You can instrument the application with one of the following OpenTelemetry API: | ||
alaudazzi marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
- [Python](https://github.com/elastic/elastic-otel-python) | ||
- [Node.js](https://github.com/elastic/elastic-otel-node) | ||
- [Java](https://github.com/elastic/elastic-otel-java) | ||
|
||
EDOT includes many types of instrumentation. The following table shows the status of instrumentation relevant to GenAI on a per-language basis: | ||
alaudazzi marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
|
||
| **SDK** | **Language** | **Instrumented Dependency** | **Traces** | **Metrics** | **Logs** | Status | **Notes** | | ||
|-------|-----|----|-----|------|------|-----|------| | ||
| OpenAI | Python | [openai][edot-openai-py]| ✅ | ✅ | ✅ | ✅ | Tested on OpenAI, Azure and Ollama | | ||
| OpenAI| JS/Node | [openai][edot-openai-js] | ✅ | ✅ | ✅ | ✅ | Tested on OpenAI, Azure and Ollama| | ||
| OpenAI| Java| [com.openai:openai-java][edot-openai-java] | ✅ | ✅ | ✅| ✅| Tested on OpenAI, Azure and Ollama| | ||
| Langchain| JS/Node| [@langchain/core][wip-edot-langchain-js] | ✅ | 🚧| 🚧 | 🔒| Tested on OpenAI; Not yet finished | | ||
| (AWS) Boto| Python| [botocore][otel-bedrock-py]| ✅ | ✅ | ✅ | ✅ | Bedrock (not SageMaker) `InvokeModel*` and `Converse*` APIs Owner: Riccardo | | ||
| Cohere| Python| [cohere][wip-otel-cohere-py] | 🚧 | 🚧 | 🚧 | 🚧 | Owner: Leighton from Microsoft | | ||
| Google Cloud AI Platform | Python | [google-cloud-aiplatform][otel-vertexai-py] | ✅ | 🚧| 🚧| 🚧 | Vertex (not Gemini); Clashes with OpenLLMetry package | | ||
|
||
## Getting started | ||
alaudazzi marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
Check these instructions on how to setup and collect OpenTelemetry data for your LLM applications [create a link to https://github.com/elastic/opentelemetry/pull/100/files#diff-965570d21670c0ee4bba4b303960e5fe83b285f66b001ff8f31f0413f65a9d47 once the content is finalized and merged] | ||
|
||
## Use cases | ||
|
||
### Understand LLM performance and reliability | ||
|
||
For an SRE team optimizing a customer support system powered by Azure OpenAI, Elastic’s [Azure OpenAI integration](https://www.elastic.co/guide/en/integrations/current/azure_openai.html) provides critical insights. They can quickly identify which model variants experience higher latency or error rates, enabling smarter decisions on model deployment or even switching providers based on real-time performance metrics. | ||
|
||
:::{image} ../../../images/llm-performance-reliability.png | ||
:alt: LLM performance and reliability | ||
:screenshot: | ||
::: | ||
|
||
### Troubleshoot OpenAI-powered applications | ||
|
||
Consider an enterprise utilizing an OpenAI model for real-time user interactions. Encountering unexplained delays, an SRE can use OpenAI tracing to dissect the transaction pathway, identify if one specific API call or model invocation is the bottleneck, and monitor a request to see the exact prompt and response between the user and the LLM. | ||
|
||
:::{image} ../../../images/llm-openai-applications.png | ||
:alt: Troubleshoot OpenAI-powered applications | ||
:screenshot: | ||
::: | ||
|
||
### Addressing cost and usage concerns | ||
|
||
For cost-sensitive deployments, being acutely aware of which LLM configurations are more cost-effective is crucial. Elastic’s dashboards, pre-configured to display model usage patterns, help mitigate unnecessary spending effectively. You can use out-of-the-box dashboards for Azure OpenAI, OpenAI, Amazon Bedrock, and Google VertexAI models. | ||
alaudazzi marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
:::{image} ../../../images/llm-costs-usage-concerns.png | ||
:alt: LLM cost and usage concerns | ||
:screenshot: | ||
::: | ||
|
||
### Understand compliance with guardrails in Amazon Bedrock | ||
|
||
With the Elastic Amazon Bedrock integration for Guardrails, SREs can swiftly address security concerns, like verifying if certain user interactions prompt policy violations. Elastic's observability logs clarify whether guardrails rightly blocked potentially harmful responses, bolstering compliance assurance. | ||
|
||
:::{image} ../../../images/llm-amazon-bedrock-guardrails.png | ||
:alt: Elastic Amazon Bedrock integration for Guardrails | ||
:screenshot: | ||
::: | ||
|
Uh oh!
There was an error while loading. Please reload this page.