diff --git a/config/_default/menus/main.en.yaml b/config/_default/menus/main.en.yaml
index 25ae18f61dd17..f65c441cc39bf 100644
--- a/config/_default/menus/main.en.yaml
+++ b/config/_default/menus/main.en.yaml
@@ -4775,6 +4775,11 @@ menu:
parent: llm_obs_monitoring
identifier: llm_obs_monitoring_metrics
weight: 305
+ - name: Dashboards
+ url: llm_observability/monitoring/dashboards
+ parent: llm_obs_monitoring
+ identifier: llm_obs_monitoring_dashboards
+ weight: 306
- name: Evaluations
url: llm_observability/evaluations/
parent: llm_obs
diff --git a/content/en/llm_observability/monitoring/dashboards.md b/content/en/llm_observability/monitoring/dashboards.md
new file mode 100644
index 0000000000000..3bca35cb2340d
--- /dev/null
+++ b/content/en/llm_observability/monitoring/dashboards.md
@@ -0,0 +1,419 @@
+---
+title: Out-of-the-Box Dashboards
+description: Learn about the out-of-the-box dashboards available for LLM Observability.
+further_reading:
+ - link: "/llm_observability/monitoring/"
+ tag: "Documentation"
+ text: "Learn about LLM Observability monitoring"
+ - link: "/llm_observability/evaluations/"
+ tag: "Documentation"
+ text: "Learn about LLM Observability evaluations"
+ - link: "/dashboards/"
+ tag: "Documentation"
+ text: "Learn about Dashboards"
+---
+
+## Overview
+
+Datadog provides four out-of-the-box dashboards for LLM Observability that automatically populate when you instrument your LLM applications. These dashboards help you monitor operational health, analyze chain execution, evaluate quality and safety, and get a comprehensive overview of your LLM applications.
+
+The four out-of-the-box dashboards are:
+
+- [LLM Observability Operational Insights](#llm-observability-operational-insights): High-level operational metrics and performance tracking
+- [LLM Observability LLM Chain Insights](#llm-observability-llm-chain-insights): Detailed span-level analysis of chains and workflows
+- [LLM Observability Evaluation Insights](#llm-observability-evaluation-insights): Quality, safety, and custom evaluation monitoring
+- [LLM Observability Overview](#llm-observability-overview): Comprehensive single-pane view of costs, performance, and safety
+
+
If you see empty dashboards, start by instrumenting your LLM application with the
Datadog SDK or
API.
+
+## LLM Observability Operational Insights
+
+The [LLM Observability Operational Insights dashboard][1] is the main integration dashboard that tracks high-level operational metrics and performance for your LLM applications.
+
+### Key metrics
+
+This dashboard tracks:
+
+- Model usage, cost, and latency
+- LLM application performance
+- Token consumption (prompt and completion)
+- Trace success and error rates
+- Time to first token
+
+### Sections
+
+**Overview**
+
+The Overview section provides high-level metrics:
+
+- Active ML applications with LLM calls
+- Trace success and error rate
+- Total traces and spans
+- Token generation rate
+- LLM calls
+- Monitor creation link for [LLM Observability monitors][2]
+
+**LLM Requests**
+
+Detailed LLM request metrics including:
+
+- Response times (p50, p95)
+- Model usage breakdown
+- Token usage by prompt and completion
+- Cache hit percentage
+
+**Traces**
+
+End-to-end trace metrics:
+
+- Trace execution time
+- Error rates by ML application
+- Duration percentiles (p50, p75, p95)
+
+### Related dashboards
+
+The Operational Insights dashboard links to:
+
+- [LLM Observability LLM Chain Insights](#llm-observability-llm-chain-insights): For insights into LLM chains and spans
+- [LLM Observability Evaluation Insights](#llm-observability-evaluation-insights): For quality and safety evaluations
+
+## LLM Observability LLM Chain Insights
+
+The [LLM Observability LLM Chain Insights dashboard][3] provides deep visibility into your LLM chains and a detailed breakdown of all spans in your LLM applications.
+
+### Purpose
+
+Get comprehensive insights into:
+
+- Tool executions (external API or software calls)
+- Tasks (internal operations)
+- Retrievals (vector search and RAG)
+- Embeddings (embedding model calls)
+
+### Sections
+
+**Overview**
+
+High-level span metrics:
+
+- Total traces and spans
+- Span kind usage breakdown
+- Error rate by span kind
+- Timeline of total spans
+
+**Tools**
+
+A tool execution represents an external API or software call that helps your LLM agent.
+
+Metrics tracked:
+
+- Tool error rate
+- Average tools per trace
+- Tool usage by name
+- Usage count, duration, and errors (table view)
+- Top tools by success and error
+- Tools by p95 duration
+- Link to view tool spans
+
+**Tasks**
+
+A task represents a single non-LLM operation without external requests (for example, data preprocessing).
+
+Metrics tracked:
+
+- Task error rate
+- Average tasks per trace
+- Task usage over time
+- Usage count, duration, and errors (table view)
+- Top tasks by success and error
+- Tasks by p95 duration
+
+**Embeddings**
+
+An embedding represents a standalone call to an embedding model or function.
+
+Metrics tracked:
+
+- Embedding error rate
+- Average usage per chain
+- Usage breakdown and timeline
+- Usage count, duration, and errors (table view)
+- Usage by success and error
+- Usage by p95 duration
+
+**Retrievals**
+
+A retrieval represents a vector search returning ranked documents from a knowledge base.
+
+Metrics tracked:
+
+- Retrieval error rate
+- Average retrievals per chain
+- Retrievals over time
+- Usage count, duration, and errors (table view)
+- Retrievals by success and error
+- Retrievals by p95 duration
+- Top documents retrieved
+- Average document retrieval score
+
+### Filtering
+
+Use the `ml_app` template variable at the top of the dashboard to filter by specific LLM applications.
+
+### Related dashboards
+
+- [LLM Observability Operational Insights](#llm-observability-operational-insights): High-level operational insights
+- [LLM Observability Evaluation Insights](#llm-observability-evaluation-insights): Quality and custom evaluations
+
+This dashboard helps you understand the internal composition and performance of your LLM chains at a granular span level.
+
+## LLM Observability Evaluation Insights
+
+The [LLM Observability Evaluation Insights dashboard][4] provides comprehensive oversight of your LLM applications' quality, safety, and privacy evaluations.
+
+### Purpose
+
+Monitor evaluations to ensure your LLM applications deliver:
+
+- High quality responses
+- Safe interactions
+- Protection from malicious use
+- Compliance with privacy standards
+
+### Sections
+
+**Overview**
+
+High-level security and quality metrics:
+
+- Total unanswered traces
+- Total negative interactions
+- Total malicious interactions
+- Total security checks triggered
+- Link to view traces
+
+**User Analytics**
+
+To populate this section, add
session_id
to your application spans. See the SDK documentation for
Python,
Node.js,
Java, or the
API.
+
+Metrics tracked:
+
+- User messages per session
+- Average session length
+- Error rate per session
+- Average error rate per session
+
+Sentiment analysis:
+
+- Input and output sentiment trends (positive and negative)
+- Sentiment trigger counts
+
+**Quality Evaluations**
+
+
+
+The Quality Evaluations section includes:
+
+*Topic Relevancy*
+
+Evaluates whether prompt-response pairs stay on the intended topic. For example, an e-commerce bot receiving pizza recipe questions is flagged.
+
+Configure topics on the [Cluster Map][5] to improve accuracy. View topic relevancy traces from the dashboard.
+
+*Failure to Answer*
+
+Identifies when the LLM doesn't provide satisfactory answers. View failure to answer traces from the dashboard.
+
+*Language Mismatch*
+
+Detects when the LLM responds in a different language than the user's question. View language mismatch traces from the dashboard.
+
+*Toxicity*
+
+Evaluates input prompts and outputs for toxic language.
+
+Metrics:
+
+- Input toxicity detections
+- Output toxicity detections
+- Link to view toxicity traces
+
+*Hallucination*
+
+Identifies claims that contradict the provided input context. View hallucination traces from the dashboard.
+
+**Security & Safety Evaluations**
+
+
+
+*Prompt Injection*
+
+Identifies malicious attempts to manipulate LLM responses. View prompt injection traces from the dashboard.
+
+*Sensitive Data Scanner*
+
+Powered by [Sensitive Data Scanner][6], this feature scans, identifies, and redacts sensitive information in prompt-response pairs.
+
+Widgets:
+
+- Detection timeline by security type
+- Top detections (top list)
+
+*Malicious Users*
+
+Table tracking users by:
+
+- Prompt injection attempts
+- Off-topic conversations
+- Toxic conversations
+
+**Custom Evaluations**
+
+Submit your own [custom evaluations][7] based on:
+
+- Business success metrics
+- Direct user feedback
+
+Types supported:
+
+- Score evaluations (numeric)
+- Categorical evaluations (labeled)
+
+Clone this dashboard to customize widgets and filters for your specific use case.
+
+## LLM Observability Overview
+
+The [LLM Observability Overview dashboard][8] is a comprehensive dashboard providing a single-pane view of costs, performance, and safety for your LLM applications.
+
+### Purpose
+
+This dashboard provides unified monitoring for:
+
+- Cost estimation and token usage
+- Performance and latency metrics
+- Quality and safety evaluations
+- Tool and task execution
+
+### Sections
+
+**Top Overview**
+
+High-level metrics:
+
+- **OpenAI Estimated Cost**: Cost breakdown by model (GPT-3.5, GPT-4, and more)
+- **Total Number of Traces**: Overall trace count
+- **Trace Success Rate**: Percentage of successful traces (color-coded: >95% green, 50-95% yellow, <50% red)
+- **Token Generation Rate**: Tokens generated per second
+- **Which tools are being used?**: Top tool usage by call count
+- **Quality Check Violations**: Count of quality and safety issues detected
+
+**LLM Analytics**
+
+Breakdown of LLM calls by model and provider:
+
+Widgets:
+
+- **Model Usage (sunburst chart)**: Usage distribution by `model_name` and `model_provider`
+- **Tokens per LLM Call (table)**: Average prompt and completion tokens per model with trend visualization
+- **Error Rate (timeseries)**: Error rate by model and provider with thresholds:
+ - <5% = OK
+ - 5-20% = Warning
+ - >20% = Error
+- **LLM Latency (table)**: p50, p75, p95 latency by model (color-coded: >3s red, 2-3s yellow, <2s green)
+- Link to view error traces
+
+**Traces**
+
+End-to-end execution flow metrics.
+
+A trace represents the entire execution from request receipt to response.
+
+Widgets:
+
+- **Trace latency (timeseries)**: p50, p75, p95 latency over time
+- **Trace Error Rate (timeseries)**: Error percentage by `ml_app` with markers:
+ - 0-5% = OK
+ - 5-20% = Warning
+ - >20% = Error
+- **Trace Execution Time Breakdown by Span Kind (sunburst)**: Average duration by span type (LLM, tool, task, and more)
+- Link to view related spans
+
+**Tools**
+
+External API and software interface calls.
+
+A tool represents an external API call (for example, calculator, weather API).
+
+Widgets:
+
+- **Usage over time (table)**: Tool call count, latency (trend), and errors by tool name
+- **Usage Breakdown by # of Executions (sunburst)**: Top tools by execution count
+- **Tool Error Rate**: Percentage of tool spans with errors (>5% flagged red)
+
+**Tasks**
+
+Internal non-LLM operations.
+
+A task represents an internal operation without external requests (for example, data preprocessing).
+
+Widgets:
+
+- **Usage over Time (table)**: Task call count, latency (trend), and errors by task name
+- **Usage Breakdown by # of Executions (sunburst)**: Top tasks by execution count (by `ml_app` and task name)
+- **Task Error Rate**: Percentage of task spans with errors (>5% flagged red)
+
+**Evaluation and Quality Checks**
+
+Evaluation insights:
+
+Top-level metrics:
+
+- **Negative Sentiment**: Count of negative interactions
+- **Positive Sentiment**: Count of positive sessions
+- **Toxic Messages**: Toxicity detections
+- **Prompt Injection Attempts**: Security threat count
+- **Rate of "real" user engagement**: Percentage of on-topic, non-malicious interactions:
+ - >70% = Green
+ - 50-70% = Yellow
+ - <50% = Red
+- **Bad Faith or Off-Topic Interactions**: Combined count
+
+User metrics:
+
+- **Total Interaction Count**: Total messages
+- **Unique Users**: Cardinality of `user_id`
+- **User Messages per Session**: Average messages per session
+
+Outlier Users:
+
+Tables identifying problematic users by:
+
+- Prompt injection attempts
+- Off-topic prompts
+- Toxic prompts
+- Negative sentiment
+
+Each table links to view user traces.
+
+### Filtering
+
+Use the `ml_app` template variable to filter the entire dashboard by LLM application.
+
+### Related dashboards
+
+- [LLM Observability Operational Insights](#llm-observability-operational-insights): Detailed operational metrics
+- [LLM Observability LLM Chain Insights](#llm-observability-llm-chain-insights): Span-level chain analysis
+- [LLM Observability Evaluation Insights](#llm-observability-evaluation-insights): Comprehensive evaluation dashboard
+
+## Further Reading
+
+{{< partial name="whats-next/whats-next.html" >}}
+
+[1]: https://app.datadoghq.com/dash/integration/31342/llm-observability-operational-insights-overview
+[2]: https://app.datadoghq.com/monitors/create/llm-observability
+[3]: https://app.datadoghq.com/dash/integration/31340/llm-observability-llm-chain-insights
+[4]: https://app.datadoghq.com/dash/integration/31341/llm-observability-evaluation-insights
+[5]: /llm_observability/monitoring/cluster_map/
+[6]: /llm_observability/evaluations/#sensitive-data-scanner-integration
+[7]: /llm_observability/evaluations/submit_evaluations
+[8]: https://app.datadoghq.com/dash/integration/31275/llm-observability-overview
\ No newline at end of file