Skip to content

Commit 343f47f

Browse files
kevinzenghuclaudeestherk15
authored
Create docs for Data Observability monitor type (#34541)
* First pass * Include a list of monitors and also aastra query syntax * Update references to monitors to metrics. * Remove examples to AASTRA and entity IDs. Also use better example names. * Include info on model training period * Clean up links from metric types. * Remove mention to monitor limit. And links to DWHs * Place links inline * Update references to warehouse to database * Add column.name template variable and reorder variable hierarchy Adds missing {{column.name}} template variable for column-level monitors, verified against group-by keys in the web-ui source code. Reorders variables to match the natural database > schema > table > column hierarchy. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Fix Custom SQL tab list formatting Inline the SQL example to prevent the code block from breaking the numbered list continuation in the Custom SQL tab. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add table freshness limitations for views and warehouses without system metadata Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Condense example monitors section and reorder to lead with anomaly detection Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Reframe entity selection around Edit/Source UI tabs with practical query examples Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Improve Custom SQL example query Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add annotate bounds section for anomaly detection feedback Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add 5,000 entity limit per monitor Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add entity selection, alert aggregation, and minimize entity terminology Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add screenshots * Change location of DO monitor creation flow screenshot. * Rearrange monitor configuration section * Restructure doc to match UI flow, add WHERE clause and monitor schedule sections Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Convert example notifications and example monitors to tabs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Clean up Custom SQL GROUP BY and billing language Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Re-order config * Tidy up images * Adjust some image sizes and add link. * Removing redirects since this is a brand new page Co-authored-by: Esther Kim <esther.kim@datadoghq.com> * Change link to help page Co-authored-by: Esther Kim <esther.kim@datadoghq.com> * Remove redundant link to monitor creation Co-authored-by: Esther Kim <esther.kim@datadoghq.com> * Fix typo Co-authored-by: Esther Kim <esther.kim@datadoghq.com> * In monitor creation flow, link directly to the New Monitor page Co-authored-by: Esther Kim <esther.kim@datadoghq.com> * Removing redundant metric types Co-authored-by: Esther Kim <esther.kim@datadoghq.com> * Update link format * Move images to the monitor types folder * Apply suggestion from @estherk15 * Remove monitor configuration section for now. * Add Data Observability Monitors page link and clean up link ordering Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Include a small blurb on monitor visualization --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Esther Kim <esther.kim@datadoghq.com>
1 parent b727b60 commit 343f47f

File tree

7 files changed

+277
-0
lines changed

7 files changed

+277
-0
lines changed

content/en/monitors/types/_index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@ further_reading:
3232
{{< nextlink href="/monitors/types/ci" >}}<strong>CI</strong>: Monitor CI pipelines and tests data gathered by Datadog.{{< /nextlink >}}
3333
{{< nextlink href="/monitors/types/cloud_cost" >}}<strong>Cloud Cost</strong>: Monitor cost changes associated with cloud platforms.{{< /nextlink >}}
3434
{{< nextlink href="/monitors/types/composite" >}}<strong>Composite</strong>: Alert on an expression combining multiple monitors.{{< /nextlink >}}
35+
{{< nextlink href="/monitors/types/data_observability" >}}<strong>Data Observability</strong>: Monitor freshness, row count, column-level metrics, and custom SQL queries across your data warehouses.{{< /nextlink >}}
3536
{{< nextlink href="/monitors/types/database_monitoring" >}}<strong>Database Monitoring</strong>: Monitor query execution and explain plan data gathered by Datadog.{{< /nextlink >}}
3637
{{< nextlink href="/monitors/types/error_tracking" >}}<strong>Error Tracking</strong>: Monitor issues in your applications gathered by Datadog.{{< /nextlink >}}
3738
{{< nextlink href="/monitors/types/event" >}}<strong>Event</strong>: Monitor events gathered by Datadog.{{< /nextlink >}}
Lines changed: 276 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,276 @@
1+
---
2+
title: Data Observability Monitor
3+
description: "Monitor freshness, row count, column-level metrics, and custom SQL queries across your data warehouses."
4+
further_reading:
5+
- link: "/data_observability/"
6+
tag: "Documentation"
7+
text: "Data Observability Overview"
8+
- link: "/data_observability/quality_monitoring/"
9+
tag: "Documentation"
10+
text: "Quality Monitoring"
11+
- link: "/monitors/notify/"
12+
tag: "Documentation"
13+
text: "Configure your monitor notifications"
14+
- link: "/monitors/downtimes/"
15+
tag: "Documentation"
16+
text: "Schedule a downtime to mute a monitor"
17+
- link: "/monitors/status/"
18+
tag: "Documentation"
19+
text: "Consult your monitor status"
20+
---
21+
22+
## Overview
23+
24+
[Data Observability][1] monitors use anomaly detection that learns from seasonality, trends, and user feedback to catch delayed data, incomplete loads, and unexpected value changes before they affect downstream dashboards, AI applications, or business decisions. Combined with end-to-end data and code lineage, these monitors help teams detect issues early, assess downstream impact, and route to the right owner.
25+
26+
Data Observability monitors support the following metric types:
27+
28+
**Table-level metric types:**
29+
| Metric type | Description |
30+
|---|---|
31+
| Freshness | Tracks the time elapsed since a table was last updated. |
32+
| Row Count | Tracks the number of rows in a table or view. |
33+
| Custom SQL | Tracks a custom metric value returned by a SQL query. |
34+
35+
**Column-level metric types:**
36+
| Metric type | Description |
37+
|---|---|
38+
| Freshness | Tracks the most recent date seen in a datetime column. |
39+
| Uniqueness | Tracks the percentage of unique values. |
40+
| Nullness | Tracks the percentage of null values. |
41+
| Cardinality | Tracks the number of distinct values. |
42+
| Percent Zero | Tracks the percentage of values equal to zero. |
43+
| Percent Negative | Tracks the percentage of negative values. |
44+
| Min / Max / Mean / Sum / Standard Deviation | Tracks statistical measures across column values. |
45+
46+
Datadog collects metrics such as row count and freshness from warehouse system metadata (for example, `INFORMATION_SCHEMA`) when available. This avoids running a query against your warehouse and reduces compute costs. Not all warehouses expose system metadata. For metrics that cannot be collected from system metadata, the monitor runs a query directly against your warehouse to compute the value.
47+
48+
Data Observability monitors require [Quality Monitoring][2] to be set up with at least one supported data warehouse (for example, [Snowflake][3], [Databricks][4], or [BigQuery][5]).
49+
50+
## Monitor creation
51+
52+
To create a Data Observability monitor in Datadog, navigate to [**Data Observability** > **Monitors** > **New Monitor**][6] or [**Monitors** > **New Monitor** > **Data Observability**][6]. To view all existing Data Observability monitors, see the [Data Observability Monitors page][7].
53+
54+
## Choose data to monitor
55+
56+
First, select whether to monitor the **Table** or **Column** level:
57+
58+
{{< img src="monitors/monitor_types/data_observability/entity_type_selection_and_aastra.png" alt="Input field for selecting entity type and inputting a query" style="width:60%;" >}}
59+
60+
Then, use the **Edit** tab to search for tables, views, or columns by typing `key:value` filters into the search field. The following attributes are available:
61+
62+
| Filter | Example | Description |
63+
|---|---|---|
64+
| Name | `name:USERS*` | Match by name. Supports `*` wildcards. |
65+
| Schema | `schema:PROD` | Match by schema. |
66+
| Database | `database:ANALYTICS_DB` | Match by database. |
67+
| Account | `account:my_account` | Match by account. |
68+
69+
Combine filters with `AND` or `OR`, use parentheses to group conditions, and prefix with `-` to exclude.
70+
71+
**Examples:**
72+
73+
| Goal | Query |
74+
|---|---|
75+
| All tables in the PROD schema, excluding temp tables | `schema:PROD AND -name:TEMP*` |
76+
| All timestamp columns | `name:*_AT OR name:*_TIMESTAMP` |
77+
| Tables in either PROD or STAGING for a specific database | `database:ANALYTICS_DB AND (schema:PROD OR schema:STAGING)` |
78+
79+
A single monitor can track up to 5,000 tables, views, or columns. This limit cannot be increased. If your query matches more, split them across multiple monitors.
80+
81+
Switch to the **Source** tab to see the backing query generated from your selections. The query follows this format:
82+
83+
{{< code-block lang="text" >}}
84+
search for [ENTITY_TYPE] where `[FILTER_CONDITIONS]`
85+
{{< /code-block >}}
86+
87+
## Select your metric type
88+
89+
Choose a metric type based on the data quality signal you want to track. Each monitor tracks one metric type.
90+
91+
{{< tabs >}}
92+
{{% tab "Freshness" %}}
93+
94+
The **Freshness** metric type detects when data has not been updated within an expected time window. Use it to catch stale data before it affects downstream reports or models.
95+
96+
- **Table freshness** tracks the time elapsed since the table was last updated. Table freshness is not available for views or for data warehouses that do not provide updated timestamps for tables in system metadata. Use column-level freshness instead.
97+
- **Column freshness** tracks the most recent date seen in a datetime column.
98+
99+
{{% /tab %}}
100+
{{% tab "Row Count" %}}
101+
102+
The **Row Count** metric type tracks row count changes in your tables. Use it to detect unexpected drops or spikes in data that could indicate pipeline failures or upstream issues.
103+
104+
{{% /tab %}}
105+
{{% tab "Column Metric" %}}
106+
107+
**Column** metric types track column-level metrics to detect data drift or quality degradation. Select from the following:
108+
109+
| Metric | Description |
110+
|---|---|
111+
| **Uniqueness** | The percentage of values in a column that are unique. |
112+
| **Nullness** | The percentage of values in a column that are null. |
113+
| **Cardinality** | The number of distinct values in a column. |
114+
| **Percent Zero** | The percentage of values in a column that are equal to zero. |
115+
| **Percent Negative** | The percentage of values in a column that are negative. |
116+
| **Min** | The minimum of all values in a column. |
117+
| **Max** | The maximum of all values in a column. |
118+
| **Mean** | The average of all values in a column. |
119+
| **Standard Deviation** | The measure of variation within values in a column. |
120+
| **Sum** | The sum of all values in a column. |
121+
122+
<div class="alert alert-info">Some column metrics are only available for specific column types. Numeric metrics (Percent Zero, Percent Negative, Min, Max, Mean, Standard Deviation, Sum) require numeric columns.</div>
123+
124+
{{% /tab %}}
125+
{{% tab "Custom SQL" %}}
126+
127+
The **Custom SQL** metric type tracks a custom metric value returned by a SQL query that you define. Use it when built-in metric types do not cover your use case, such as monitoring business-specific data quality rules.
128+
129+
1. Select a **model type** that describes the value returned by your query:
130+
- **Default**: The query returns a scalar value. Use this in most cases.
131+
- **Freshness**: The query returns the difference (in seconds) between the current time and the last time an event occurred.
132+
- **Percentage**: The query returns a percentage value between 0 and 100.
133+
2. Write a SQL query that returns a single value aliased as `dd_value`, for example: `SELECT COUNT(*) as dd_value FROM ANALYTICS_DB.PROD.ORDERS WHERE STATUS = 'FAILED'`
134+
3. Click **Validate** to verify your query syntax.
135+
136+
If your SQL query includes a `GROUP BY` clause, list the grouped columns as a comma-separated list in the **Group by** field (for example, `column_a, column_b`). Each group is evaluated independently.
137+
138+
**Note**: Each Custom SQL monitor counts as an individual monitored table for billing purposes.
139+
140+
{{< img src="monitors/monitor_types/data_observability/custom_sql_example.png" alt="Input field for custom SQL monitor creation." style="width:60%;" >}}
141+
142+
{{% /tab %}}
143+
{{< /tabs >}}
144+
145+
## Configure monitor
146+
147+
### Detection method
148+
149+
Select a detection method:
150+
151+
- **Anomaly**: Alert when the metric deviates from an expected pattern. Threshold values are not required. The anomaly model requires **3 to 7 days** to train (including a weekend), depending on how frequently the underlying data updates. During the training period, the monitor does not trigger alerts and will be visualized in blue. After training completes, the monitor will be shown in green when in a normal state and red when in an outlier state.
152+
- **Threshold**: Alert when the metric crosses a fixed value. Set the comparison operator (`above`, `above or equal to`, `below`, `below or equal to`, `equal to`, or `not equal to`) and define a **Critical** threshold (required) and optionally a **Warning** threshold. For more details, see [Configure Monitors][8].
153+
154+
### WHERE clause
155+
156+
Add a **WHERE** clause to filter the data evaluated by the monitor. This is useful for monitoring specific segments of data or only recent records. For example:
157+
158+
- `created_at >= DATEADD(day, -7, CURRENT_TIMESTAMP())` — only monitor rows from the past week.
159+
- `region = 'US'` — only monitor data for a specific region.
160+
161+
### Group by
162+
163+
You can add a **Group by** clause to split a single monitor into multiple groups, each evaluated independently. For example, grouping a row count monitor by a `REGION` column produces a separate alert for each geography.
164+
165+
{{< img src="monitors/monitor_types/data_observability/group_by_column_selection.png" alt="Input field for selecting GROUP BY dimensions." style="width:80%;" >}}
166+
167+
The default limit is 100 groups per monitor. To increase this limit, [contact Support][9].
168+
169+
### Monitor schedule
170+
171+
Set how often the monitor evaluates your data:
172+
173+
- **Hourly**: The monitor runs every hour.
174+
- **Daily**: The monitor runs once per day.
175+
176+
### Set alert conditions
177+
178+
Choose an aggregation type:
179+
180+
- **Simple Alert**: Send a single notification when any monitored table or column meets the condition.
181+
- **Multi Alert**: Send a notification for each group meeting the condition. Customize which dimensions to group by (for example, `table`, `schema`, `database`) to control alert granularity. For example, grouping by `schema` only sends one alert per schema, bundling all affected tables together to reduce noise.
182+
183+
### Example notification
184+
185+
{{< tabs >}}
186+
{{% tab "Threshold" %}}
187+
188+
{{< code-block lang="text" >}}
189+
{{#is_alert}}
190+
Data quality issue detected on {{database.name}}.{{schema.name}}.{{table.name}}:
191+
current value {{value}} has breached the threshold of {{threshold}}.
192+
{{/is_alert}}
193+
194+
{{#is_recovery}}
195+
Data quality issue on {{database.name}}.{{schema.name}}.{{table.name}} has recovered.
196+
Current value {{value}} is within the threshold of {{threshold}}.
197+
{{/is_recovery}}
198+
{{< /code-block >}}
199+
200+
{{% /tab %}}
201+
{{% tab "Anomaly" %}}
202+
203+
{{< code-block lang="text" >}}
204+
{{#is_alert}}
205+
Anomaly detected on {{database.name}}.{{schema.name}}.{{table.name}}:
206+
observed value {{observed}} is outside the expected range of {{lower_bound}} to {{upper_bound}}
207+
(predicted: {{predicted}}).
208+
{{/is_alert}}
209+
210+
{{#is_recovery}}
211+
{{database.name}}.{{schema.name}}.{{table.name}} has recovered.
212+
Observed value {{observed}} is within the expected range.
213+
{{/is_recovery}}
214+
{{< /code-block >}}
215+
216+
{{% /tab %}}
217+
{{< /tabs >}}
218+
219+
## Example monitors
220+
221+
{{< tabs >}}
222+
{{% tab "Row count drop" %}}
223+
224+
Detect a significant decrease in row count that could indicate a pipeline failure or missing data.
225+
226+
1. Select **Table** > **Row Count** and choose the target table (for example, `ANALYTICS_DB.PROD.EVENTS`).
227+
1. Select **Anomaly** as the detection method. The monitor triggers when the row count deviates from its historical baseline.
228+
229+
{{% /tab %}}
230+
{{% tab "Stale table" %}}
231+
232+
Alert when a critical table has not been updated within the expected time window.
233+
234+
1. Select **Table** > **Freshness** and choose the target table (for example, `ANALYTICS_DB.PROD.ORDERS`).
235+
1. Select **Threshold** as the detection method.
236+
1. Set the **Alert threshold** to **6 hours** and optionally a **Warning threshold** at **4 hours**.
237+
238+
{{% /tab %}}
239+
{{% tab "Null percentage spike" %}}
240+
241+
Detect when a column's null percentage exceeds normal levels, which may indicate data ingestion issues.
242+
243+
1. Select **Column** > **Nullness** and choose the target column (for example, `ANALYTICS_DB.PROD.USERS.EMAIL`).
244+
1. Select **Anomaly** as the detection method.
245+
246+
{{% /tab %}}
247+
{{< /tabs >}}
248+
249+
## Annotate bounds
250+
251+
For monitors using the **Anomaly** detection method, you can annotate bound ranges to provide feedback and improve the model over time. Unlike infrastructure metrics, data quality metrics are often business-specific, so use annotations to teach the model what behavior is normal for your data.
252+
253+
{{< img src="/monitors/monitor_types/data_observability/annotate_bounds.png" alt="Hover menu for annotating a monitor bound." style="width:90%;" >}}
254+
255+
On a monitor's status page, click **Annotate Bounds**, select a time range on the chart, and choose one of the following annotations:
256+
257+
| Annotation | Description |
258+
|---|---|
259+
| **Expected** | Expand bounds to include the marked behavior permanently. |
260+
| **Reset for now** | Mark behavior as OK, but alert if it happens again. |
261+
| **Missed alert** | Contract bounds to alert on this behavior. |
262+
| **Ignore** | Exclude annotated data when modeling bounds. |
263+
264+
## Further Reading
265+
266+
{{< partial name="whats-next/whats-next.html" >}}
267+
268+
[1]: /data_observability/
269+
[2]: /data_observability/quality_monitoring/
270+
[3]: /data_observability/quality_monitoring/data_warehouses/snowflake/
271+
[4]: /data_observability/quality_monitoring/data_warehouses/databricks/
272+
[5]: /data_observability/quality_monitoring/data_warehouses/bigquery/
273+
[6]: https://app.datadoghq.com/monitors/create/data-quality
274+
[7]: https://app.datadoghq.com/data-obs/monitors
275+
[8]: /monitors/configuration/?tab=thresholdalert#thresholds
276+
[9]: /help/
162 KB
Loading
96.9 KB
Loading
24.9 KB
Loading
358 KB
Loading
97.7 KB
Loading

0 commit comments

Comments
 (0)