Correlating Anomalies via Temporal Overlap Similarity by kaituo · Pull Request #1641 · opensearch-project/anomaly-detection

kaituo · 2026-01-06T07:17:24Z

Description

OpenSearch anomalies such as service degradation, job delays, and incident bursts are represented as time intervals, not isolated points. If two detectors fire on the same incident, their anomaly intervals will substantially overlap in time (might with a little timestamp jitter due to different interval, detector start time, and causal relationship). Our similarity therefore measures:

how much the time windows overlap (after a small tolerance δ to account for jitter),
optionally, whether the duration is consistent.

This PR implements threshold-graph + connected components based on similarity.

Major algorithm:

De-dupe input anomalies by id (stable insertion order).
For every pair (i,j):
- Dilate both time intervals by ±delta to tolerate bucket alignment drift.
- Require dilated overlap >= minOverlap (cheap early filter).
- Compute temporal overlap:
  - IoU (Jaccard over time) on dilated intervals
  - Overlap coefficient (overlap / min(lenA,lenB)) for containment cases
- Detect strong containment (ovl >= tauContain and duration ratio <= rhoMax).
- Pick temporal term by mode:
  - IOU: use IoU
  - OVL: use overlap coefficient
  - HYBRID: if strong containment, blend ((1-lam)IoU + lamOVL); else use IoU
- Compute duration penalty exp(-|durA-durB|/kappa).
  - If strong containment, relax the penalty via pow(basePen, containmentRelax) (or disable penalty entirely when containmentRelax == 0).
- Similarity = temporalTerm * penalty; add an undirected edge if similarity >= alpha.
Run DFS connected-components on the threshold graph to form clusters.
Output deterministically: sort members in each cluster by anomaly id.
Attach an event window per cluster as [min(start), max(end)] across its members.

Testing done:

UT
Tests on real world data

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

New functionality includes testing.
New functionality has been documented.
API changes companion pull request created.
Commits are signed per the DCO using --signoff.
Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

codecov · 2026-01-06T21:41:36Z

Codecov Report

❌ Patch coverage is 75.07418% with 84 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.29%. Comparing base (0199c1c) to head (b3eb632).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
.../opensearch/ad/correlation/AnomalyCorrelation.java	74.36%	44 Missing and 37 partials ⚠️
...in/java/org/opensearch/ad/correlation/Anomaly.java	85.71%	1 Missing and 2 partials ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #1641      +/-   ##
============================================
- Coverage     81.36%   81.29%   -0.08%     
- Complexity     6151     6237      +86     
============================================
  Files           542      544       +2     
  Lines         24986    25323     +337     
  Branches       2543     2621      +78     
============================================
+ Hits          20331    20587     +256     
- Misses         3383     3424      +41     
- Partials       1272     1312      +40

Flag	Coverage Δ
plugin	`81.29% <75.07%> (-0.08%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
...in/java/org/opensearch/ad/correlation/Anomaly.java	`85.71% <85.71%> (ø)`
.../opensearch/ad/correlation/AnomalyCorrelation.java	`74.36% <74.36%> (ø)`

... and 13 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

jackiehanyang · 2026-01-08T19:03:30Z

should we not allow grouping anomalies from the same detector if they overlap or are adjacent?

jackiehanyang · 2026-01-08T19:08:20Z

It seems like the current implementation uses a brute force pairwise comparison in nested loops, compares every anomaly with every other anomaly, so the time complexity is O(n^2). Could we sorts anomalies by start time and only compares active overlapping intervals, making it more efficient for large datasets?

kaituo · 2026-01-08T21:00:18Z

should we not allow grouping anomalies from the same detector if they overlap or are adjacent?

We don't allow group anomalies from the same entities (e.g., model ids). But grouping anomalies from the same detector is allowed since we may have high cardinality detectors. Also for single-stream detectors, if anomalies are adjacent, it may make sense to combine them.

kaituo · 2026-01-08T22:45:39Z

It seems like the current implementation uses a brute force pairwise comparison in nested loops, compares every anomaly with every other anomaly, so the time complexity is O(n^2). Could we sorts anomalies by start time and only compares active overlapping intervals, making it more efficient for large datasets?

Changed to use active overlapping intervals. See diff: https://github.com/opensearch-project/anomaly-detection/compare/fca7de0c05300a4322cff6d97c8644fe85df5d0b..82a03a2bf142310a23a8b9a1f662e739e3c35cd5

jackiehanyang · 2026-01-12T18:37:15Z

should we not allow grouping anomalies from the same detector if they overlap or are adjacent?

We don't allow group anomalies from the same entities (e.g., model ids). But grouping anomalies from the same detector is allowed since we may have high cardinality detectors. Also for single-stream detectors, if anomalies are adjacent, it may make sense to combine them.

I mean more like excluding groups that only have one entity in it. Results that only have one entity in it are not presentable on dashboard to customers. Should we stop generating those?

kaituo · 2026-01-12T20:16:57Z

should we not allow grouping anomalies from the same detector if they overlap or are adjacent?

We don't allow group anomalies from the same entities (e.g., model ids). But grouping anomalies from the same detector is allowed since we may have high cardinality detectors. Also for single-stream detectors, if anomalies are adjacent, it may make sense to combine them.

I mean more like excluding groups that only have one entity in it. Results that only have one entity in it are not presentable on dashboard to customers. Should we stop generating those?

added a parameter to exclude groups that only have one entity in it: https://github.com/opensearch-project/anomaly-detection/compare/82a03a2bf142310a23a8b9a1f662e739e3c35cd5..31ed352793d65e8bbcbe219da48bbe76ddce76f2

jackiehanyang · 2026-01-15T18:34:54Z

src/main/java/org/opensearch/ad/correlation/AnomalyCorrelation.java

+        LinkedHashMap<String, Anomaly> deduped = new LinkedHashMap<>();
+        for (Anomaly anomaly : anomalies) {
+            Objects.requireNonNull(anomaly, "anomaly");
+            String id = Objects.requireNonNull(anomaly.getId(), "anomaly.id");


Anomaly id will always be unique? Should we dedupe the input list by detector id, entity, start, end before calling AnomalyCorrelation?

anomaly id is model id. It is unique. I will rename id to model id to be explicit. Yes, I can dedup by start and end. Model id will uniquely determine detector id and entity.

jackiehanyang · 2026-01-15T19:22:59Z

After running the algorithm on different data, I noticed the clustering results tend to be fragmented - multiple clusters with a small number (usually 3 - 5) of entities in it. The time gap in between different clusters are usually 30-60 minutes. Is there any improvement we can done to bridge across short quiet periods?

kaituo · 2026-01-21T18:20:10Z

After running the algorithm on different data, I noticed the clustering results tend to be fragmented - multiple clusters with a small number (usually 3 - 5) of entities in it. The time gap in between different clusters are usually 30-60 minutes. Is there any improvement we can done to bridge across short quiet periods?

add more dilation in the start of an anomaly so that if two anomalies don’t overlap (even after dilation), they can still have a chance to be correlated: b3eb632

OpenSearch anomalies such as service degradation, job delays, and incident bursts are represented as time intervals, not isolated points. If two detectors fire on the same incident, their anomaly intervals will substantially overlap in time (might with a little timestamp jitter due to different interval, detector start time, and causal relationship). Our similarity therefore measures: * how much the time windows overlap (after a small tolerance δ to account for jitter), * optionally, whether the duration is consistent. This PR implements threshold-graph + connected components based on similarity. Major algorithm: - De-dupe input anomalies by id (stable insertion order). - For every pair (i,j): - Dilate both time intervals by ±delta to tolerate bucket alignment drift. - Require dilated overlap >= minOverlap (cheap early filter). - Compute temporal overlap: - IoU (Jaccard over time) on dilated intervals - Overlap coefficient (overlap / min(lenA,lenB)) for containment cases - Detect strong containment (ovl >= tauContain and duration ratio <= rhoMax). - Pick temporal term by mode: - IOU: use IoU - OVL: use overlap coefficient - HYBRID: if strong containment, blend ((1-lam)*IoU + lam*OVL); else use IoU - Compute duration penalty exp(-|durA-durB|/kappa). - If strong containment, relax the penalty via pow(basePen, containmentRelax) (or disable penalty entirely when containmentRelax == 0). - Similarity = temporalTerm * penalty; add an undirected edge if similarity >= alpha. - Run DFS connected-components on the threshold graph to form clusters. - Output deterministically: sort members in each cluster by anomaly id. - Attach an event window per cluster as [min(start), max(end)] across its members. Testing done: 1. UT 2. Tests on real world data Signed-off-by: Kaituo Li <kaituo@amazon.com>

Signed-off-by: kaituo <kaituo@amazon.com>

kaituo requested review from VijayanB, amitgalitz, dbwiddis, jackiehanyang, jmazanec15, jngz-es, joshpalis, ohltyler, owaiskazi19, saratvemulapalli, sean-zheng-amazon, vamshin and ylwu-amzn as code owners January 6, 2026 07:17

opensearch-trigger-bot bot added documentation Improvements or additions to documentation backport 2.x labels Jan 6, 2026

kaituo removed the backport 2.x label Jan 6, 2026

kaituo force-pushed the correlation branch from 4bdd109 to 88a7ff0 Compare January 6, 2026 18:14

kaituo added feature new feature and removed documentation Improvements or additions to documentation labels Jan 6, 2026

kaituo force-pushed the correlation branch 2 times, most recently from 09dad76 to 53bebcc Compare January 6, 2026 20:51

kaituo force-pushed the correlation branch from 53bebcc to fca7de0 Compare January 6, 2026 22:47

kaituo force-pushed the correlation branch from fca7de0 to 82a03a2 Compare January 8, 2026 22:41

kaituo force-pushed the correlation branch from 82a03a2 to 31ed352 Compare January 12, 2026 20:16

jackiehanyang reviewed Jan 15, 2026

View reviewed changes

kaituo added 2 commits January 21, 2026 10:28

added a gap-tolerant temporal component

b3eb632

Signed-off-by: kaituo <kaituo@amazon.com>

kaituo force-pushed the correlation branch from 31ed352 to b3eb632 Compare January 21, 2026 18:28

jackiehanyang approved these changes Jan 22, 2026

View reviewed changes

kaituo merged commit ab3d82f into opensearch-project:main Jan 22, 2026
30 checks passed

This was referenced Feb 11, 2026

docs: add anomaly-detection-correlation report for v3.5.0 tkykenmt/opensearch-feature-explorer#2597

Merged

[bugfix] Anomaly Detection Correlation tkykenmt/opensearch-feature-explorer#2511

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correlating Anomalies via Temporal Overlap Similarity#1641

Correlating Anomalies via Temporal Overlap Similarity#1641
kaituo merged 2 commits intoopensearch-project:mainfrom
kaituo:correlation

kaituo commented Jan 6, 2026

Uh oh!

codecov bot commented Jan 6, 2026 •

edited

Loading

Uh oh!

jackiehanyang commented Jan 8, 2026

Uh oh!

jackiehanyang commented Jan 8, 2026

Uh oh!

kaituo commented Jan 8, 2026

Uh oh!

kaituo commented Jan 8, 2026

Uh oh!

jackiehanyang commented Jan 12, 2026 •

edited

Loading

Uh oh!

kaituo commented Jan 12, 2026

Uh oh!

jackiehanyang Jan 15, 2026

Uh oh!

kaituo Jan 15, 2026

Uh oh!

jackiehanyang commented Jan 15, 2026 •

edited

Loading

Uh oh!

kaituo commented Jan 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kaituo commented Jan 6, 2026

Description

Related Issues

Check List

Uh oh!

codecov bot commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jackiehanyang commented Jan 8, 2026

Uh oh!

jackiehanyang commented Jan 8, 2026

Uh oh!

kaituo commented Jan 8, 2026

Uh oh!

kaituo commented Jan 8, 2026

Uh oh!

jackiehanyang commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kaituo commented Jan 12, 2026

Uh oh!

jackiehanyang Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

kaituo Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

jackiehanyang commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kaituo commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Jan 6, 2026 •

edited

Loading

jackiehanyang commented Jan 12, 2026 •

edited

Loading

jackiehanyang commented Jan 15, 2026 •

edited

Loading

kaituo commented Jan 21, 2026 •

edited

Loading