elementary/docs/data-tests/anomaly-detection-tests/anomaly-test-troubleshooting.mdx at 2544daa47b3118013ada43581915df9e044ad713 · elementary-data/elementary

title	sidebarTitle
Anomaly Tests Troubleshooting	Anomaly Tests Troubleshooting

1. Understand the data collection for your anomaly test

First, check if your test uses a timestamp column:

# In your YAML configuration
tests:
  - elementary.volume_anomalies:
      timestamp_column: created_at# If this exists, you have a timestamp-based test

- Metrics are calculated by grouping data into time buckets (default: 'day')
- Detection period (default: 2 days) determines how many buckets are being tested
- Training period data (default: 14 days) comes from historical buckets, allowing immediate anomaly detection with sufficient history

Verify data collection:

```sql
-- Check if metrics are being collected in time bucketsSELECT
    metric_timestamp,
    metric_value,
    COUNT(*) as metrics_per_bucket
FROM your_schema.data_monitoring_metrics
WHERE table_name = 'your_table'
GROUP BY metric_timestamp, metric_value
ORDER BY metric_timestamp DESC;

```

- Each bucket should represent one time bucket (e.g., daily metrics)
- Gaps in `metric_timestamp` might indicate data collection issues
- Training uses historical buckets for anomaly detection

**Common collection issues:**

- Missing or null values in timestamp column
- Timestamp column not in expected format
- No data in specified training period

- Training period data builds up over multiple test runs, using the test run time as its timestamp column. This requires time to collect enough points; for a 14 day training period, the test would need 14 different runs on different days to have a full training set.
- Metrics are calculated for the entire table in each test run
- Detection period (default: 2 days) determines how many buckets are being tested

Check metric collection across test runs:

```sql
-- Check metrics from different test runsSELECT
    updated_at,
    metric_value
FROM your_schema.data_monitoring_metrics
WHERE table_name = 'your_table'
ORDER BY updated_at DESC;

```

- Should see one metric per test run and per dimension
- Training requires multiple test runs over time
- Each new test run creates the training point for a time bucket. A second test run within the same bucket will override the first one.

**Common collection issues:**

- Test hasn't run enough times
- Previous test runs failed
- Metrics not being saved between runs

2. Verify anomaly calculations

Anomaly detection is influenced by:

Detection period (default: 2 days) - the time window being tested
Sensitivity (default: 3.0) - how many standard deviations from normal before flagging
Training data from previous periods/runs
metrics_anomaly_score calculates the anomaly based on the data in data_monitoring metrics.

Check calculations in metrics_anomaly_score:

-- Check how anomalies are being calculatedSELECT
    metric_name,
    metric_value,
    training_avg,
    training_stddev,
    zscore,
    severity
FROM your_schema.metrics_anomaly_score
WHERE table_name = 'your_table'
ORDER BY detected_at DESC;

3. "Not enough data to calculate anomaly" error

This occurs when there are fewer than 7 training data points. To resolve:

For timestamp-based tests:

Check if your timestamp column has enough historical data
Verify time buckets are being created correctly in data_monitoring_metrics
Look for gaps in your data that might affect bucket creation

For non-timestamp tests:

Run your tests multiple times to build up training data.
Check data_monitoring_metrics to verify the data collection. The test will need data for at least 7 time buckets (e.g 7 days) to calculate the anomaly.

4. Missing data in data_monitoring_metrics

If your test isn't appearing in data_monitoring_metrics:

Verify test configuration:

tests:
  - elementary.volume_anomalies:
      timestamp_column: created_at# Check if specified correctly

Common causes:

Incorrect timestamp column name
Timestamp column contains null values or is not of type timestamp or date
For non-timestamp tests: Test hasn't run successfully
Incorrect test syntax

5. Training period changed, but results are the same

If you change it after executing elementary tests, you will need to run a full refresh to the metrics collected. This will make the next tests collect data for the new training_period timeframe. The steps are:

Change var training_period in your dbt_project.yml.
Full refresh of the model ‘data_monitoring_metrics’ by running dbt run --select data_monitoring_metrics --full-refresh.
Running the elementary tests again.

If you want the Elementary UI to show data for a longer period of time, use the days-back option of the CLI: edr report --days-back 45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1. Understand the data collection for your anomaly test

2. Verify anomaly calculations

3. "Not enough data to calculate anomaly" error

For timestamp-based tests:

For non-timestamp tests:

4. Missing data in data_monitoring_metrics

Common causes:

5. Training period changed, but results are the same

FilesExpand file tree

anomaly-test-troubleshooting.mdx

Latest commit

History

anomaly-test-troubleshooting.mdx

File metadata and controls

1. Understand the data collection for your anomaly test

2. Verify anomaly calculations

3. "Not enough data to calculate anomaly" error

For timestamp-based tests:

For non-timestamp tests:

4. Missing data in data_monitoring_metrics

Common causes:

5. Training period changed, but results are the same