Skip to content

feat: Implement alerting for Raw SQL-based dashboard tiles#2073

Open
pulpdrew wants to merge 1 commit intomainfrom
drew/raw-sql-alerting
Open

feat: Implement alerting for Raw SQL-based dashboard tiles#2073
pulpdrew wants to merge 1 commit intomainfrom
drew/raw-sql-alerting

Conversation

@pulpdrew
Copy link
Copy Markdown
Contributor

@pulpdrew pulpdrew commented Apr 8, 2026

Summary

This PR implements alerting on Raw SQL-based line/bar charts.

  • This is only available for line/bar charts because the interval and date range parameters are both required
  • The threshold is compared to the last numeric column in each result

Screenshots or video

Screen.Recording.2026-04-09.at.2.57.26.PM.mov

Logs from Check-Alerts evaluating a raw-sql alert

Screenshot 2026-04-09 at 3 01 14 PM

How to test locally or on Vercel

This must be tested locally, as alerts are not enabled in the preview environment.

Query for the "anomaly detection" example
WITH buckets AS (
  SELECT
    $__timeInterval(Timestamp) AS ts,
    count() AS bucket_count
  FROM $__sourceTable
  WHERE TimestampTime >= fromUnixTimestamp64Milli({startDateMilliseconds:Int64}) - toIntervalSecond($__interval_s * 30) -- Fetch 30 intervals back
    AND TimestampTime < fromUnixTimestamp64Milli({endDateMilliseconds:Int64})
    AND SeverityText = 'error'
  GROUP BY ts
  ORDER BY ts
  WITH FILL STEP toIntervalSecond($__interval_s)
),

anomaly_detection as (
  SELECT
    ts,
    bucket_count,
    avg(bucket_count) OVER (ORDER BY ts ROWS BETWEEN 30 PRECEDING AND 1 PRECEDING) as previous_30_avg, -- avg of previous 30 intervals
    stddevPop(bucket_count) OVER (ORDER BY ts ROWS BETWEEN 30 PRECEDING AND 1 PRECEDING) as previous_30_stddev, -- standard deviation of previous 30 intervals
    greatest(bucket_count - (previous_30_avg + 2 * previous_30_stddev), 0) as excess_over_2std -- compare bucket to avg + 2 stddev. clamp at 0.
  FROM buckets
)

SELECT ts, excess_over_2std 
FROM anomaly_detection
WHERE ts >= fromUnixTimestamp64Milli({startDateMilliseconds:Int64}) AND ts < fromUnixTimestamp64Milli({endDateMilliseconds:Int64})

References

  • Linear Issue: HDX-1605
  • Related PRs:

@changeset-bot
Copy link
Copy Markdown

changeset-bot bot commented Apr 8, 2026

🦋 Changeset detected

Latest commit: 64be45a

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 4 packages
Name Type
@hyperdx/common-utils Patch
@hyperdx/api Patch
@hyperdx/app Patch
@hyperdx/otel-collector Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@vercel
Copy link
Copy Markdown

vercel bot commented Apr 8, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
hyperdx-oss Ready Ready Preview, Comment Apr 9, 2026 7:16pm

Request Review

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 8, 2026

E2E Test Results

All tests passed • 134 passed • 3 skipped • 1068s

Status Count
✅ Passed 134
❌ Failed 0
⚠️ Flaky 1
⏭️ Skipped 3

Tests ran across 4 shards in parallel.

View full report →

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 9, 2026

🔴 Tier 4 — Critical

Touches auth, data models, config, tasks, OTel pipeline, ClickHouse, or CI/CD.

Why this tier:

  • Critical-path files (9):
    • packages/api/src/routers/external-api/__tests__/alerts.test.ts
    • packages/api/src/routers/external-api/__tests__/dashboards.test.ts
    • packages/api/src/routers/external-api/v2/alerts.ts
    • packages/api/src/routers/external-api/v2/dashboards.ts
    • packages/api/src/tasks/checkAlerts/__tests__/checkAlerts.test.ts
    • packages/api/src/tasks/checkAlerts/index.ts
    • packages/api/src/tasks/checkAlerts/providers/__tests__/default.test.ts
    • packages/api/src/tasks/checkAlerts/providers/default.ts
    • packages/api/src/tasks/checkAlerts/providers/index.ts

Review process: Deep review from a domain expert. Synchronous walkthrough may be required.
SLA: Schedule synchronous review within 2 business days.

Stats
  • Files changed: 27
  • Lines changed: 573 (+ 961 in test files, excluded from tier calculation)
  • Branch: drew/raw-sql-alerting
  • Author: pulpdrew

To override this classification, remove the review/tier-4 label and apply a different review/tier-* label. Manual overrides are preserved on subsequent pushes.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 9, 2026

PR Review

Overall the implementation is solid — proper multi-tenancy scoping, good test coverage, and a clean extension of the existing alert pipeline.

  • ⚠️ isSqlTemplateValidForAlert uses naive substring matching (sql.includes('startDateMilliseconds')) → A comment like -- startDateMilliseconds passes validation while the query has no actual time filter, causing full-table-scan alerts. Consider checking for the actual param syntax {startDateMilliseconds:Int64} instead, or at minimum validate after stripping SQL comments.

  • ⚠️ Silent breaking change in dashboards.ts PUT handler: The filter now also deletes alerts on builder tiles when their display type changes to a non-alertable type (e.g., Line→Table). Previously only raw-SQL tiles triggered alert deletion. Existing users could lose builder-tile alerts unexpectedly on dashboard save. Needs a migration strategy or explicit documentation.

  • ⚠️ Inaccurate test comment in the GROUP BY test (checkAlerts.test.ts): comment says "web=2 (exceeds threshold 1), worker=1 (meets threshold 1)" but the actual threshold is 2, not 1. If AlertThresholdType.ABOVE means strictly > (not >=), then web count=2 with threshold=2 would not fire — the test expectation ALERT may be wrong, not just the comment.

  • 🔒 readonly: '2' applied only to raw SQL queries — good call, correctly prevents DDL/DML from alert-triggered SQL while allowing query settings.

  • ✅ Connection and source lookups correctly scope to team (multi-tenancy is intact).

  • tsconfig.json missing newline at EOF is trivially fixable pre-merge.

@pulpdrew pulpdrew force-pushed the drew/raw-sql-alerting branch from 453a16a to 64be45a Compare April 9, 2026 19:13
@pulpdrew pulpdrew marked this pull request as ready for review April 9, 2026 19:34
@pulpdrew pulpdrew requested a review from dhable April 9, 2026 20:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

review/tier-4 Critical — deep review + domain expert sign-off

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant