feat(explore): Add anomaly detection for timeseries charts#40954
feat(explore): Add anomaly detection for timeseries charts#40954DeepFriedYeti wants to merge 3 commits into
Conversation
This feature commit implements the open [SIP-213] apache#40520 Implements anomaly detection with three different methods along with changes to the UI to show an appropraite section for the same. Works very closely as forecasting method and is a visual indicator like that itself Signed-off-by: Ayush Raj <ayush.raj@mercedes-benz.com>
Signed-off-by: Ayush Raj <ayush.raj@mercedes-benz.com>
Code Review Agent Run #fb2267Actionable Suggestions - 0Additional Suggestions - 3
Filtered by Review RulesBito filtered these suggestions based on rules created automatically for your feedback. Manage rules.
Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
| interface _PostProcessingAnomalyDetection { | ||
| operation: 'anomaly_detection'; | ||
| options: { | ||
| method: string; |
There was a problem hiding this comment.
Suggestion: The anomaly detection post-processing contract is too permissive because method is typed as any string, but the backend only accepts zscore, mad, or prophet. This allows invalid values to pass TypeScript checks and fail at runtime with InvalidPostProcessingError. Narrow this field to a string-literal union so invalid methods are caught at compile time. [api mismatch]
Severity Level: Major ⚠️
- ❌ Timeseries anomaly detection returns error for unsupported methods.
- ⚠️ ECharts Timeseries chart data requests fail unpredictably.Steps of Reproduction ✅
1. In the ECharts timeseries form data type at
`superset-frontend/plugins/plugin-chart-echarts/src/Timeseries/types.ts:54-73`, the field
`anomalyDetectionMethod` is declared as `string`, so any string value (e.g. `'invalid'`)
is accepted by TypeScript.
2. The timeseries query builder `buildQuery` at
`superset-frontend/plugins/plugin-chart-echarts/src/Timeseries/buildQuery.ts:44-117`
constructs a `post_processing` array including `anomalyDetectionOperator(formData,
baseQueryObject)` (lines 102-114), passing through this unchecked `anomalyDetectionMethod`
value.
3. The anomaly detection post-processing factory at
`superset-frontend/packages/superset-ui-chart-controls/src/operators/anomalyDetectionOperator.ts:25-55`
reads `formData.anomalyDetectionMethod` into `const method: string =
formData.anomalyDetectionMethod || 'zscore';` (line 30), builds `options: { method, index:
xAxisLabel }`, and returns `{ operation: 'anomaly_detection', options } as
PostProcessingAnomalyDetection` (lines 39-55); the `PostProcessingAnomalyDetection`
contract in
`superset-frontend/packages/superset-ui-core/src/query/types/PostProcessing.ts:138-149`
types `options.method` as a plain `string` (line 141), so TypeScript does not restrict
values to the supported methods.
4. On the backend, the pandas post-processing function `anomaly_detection` in
`superset/utils/pandas_postprocessing/anomaly.py:194-271` validates its `method` parameter
via `_validate_anomaly_inputs` (lines 153-170), which explicitly rejects any method not in
`("zscore", "mad", "prophet")` and raises `InvalidPostProcessingError` (lines 162-169);
this behavior is enforced by `test_anomaly_detection_invalid_method` in
`tests/unit_tests/pandas_postprocessing/test_anomaly.py:118-125`, so a frontend
`PostProcessingRule` with `options.method: 'invalid'`—allowed by the current `string`
typing—will cause a runtime `InvalidPostProcessingError` instead of being caught at
compile time.Fix in Cursor | Fix in VSCode Claude
(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** superset-frontend/packages/superset-ui-core/src/query/types/PostProcessing.ts
**Line:** 141:141
**Comment:**
*Api Mismatch: The anomaly detection post-processing contract is too permissive because `method` is typed as any string, but the backend only accepts `zscore`, `mad`, or `prophet`. This allows invalid values to pass TypeScript checks and fail at runtime with `InvalidPostProcessingError`. Narrow this field to a string-literal union so invalid methods are caught at compile time.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix| yearly_seasonality?: boolean | number; | ||
| weekly_seasonality?: boolean | number; | ||
| daily_seasonality?: boolean | number; |
There was a problem hiding this comment.
Suggestion: The seasonality option types do not include null, but the UI/operator flow uses null as a valid value to represent default Prophet behavior. This mismatch breaks the frontend type contract and forces unsafe casts even though null is a legitimate payload value for the backend. Include null in these unions to match actual runtime behavior. [api mismatch]
Severity Level: Major ⚠️
- ⚠️ Anomaly detection Prophet seasonality defaults misrepresented in typings.
- ⚠️ Timeseries plugin developers rely on unsafe type casts.Steps of Reproduction ✅
1. The anomaly detection control defaults in
`superset-frontend/packages/superset-ui-chart-controls/src/sections/anomalyDetection.tsx:46-55`
define `ANOMALY_DEFAULT_DATA` with `anomalyDetectionSeasonalityYearly`,
`anomalyDetectionSeasonalityWeekly`, and `anomalyDetectionSeasonalityDaily` all
initialized to `null` (lines 52-54), and the corresponding `SelectControl` definitions
(lines 141-201) include `[null, t('default')]` as a valid choice, so user interaction
produces `formData` where these seasonality fields are `null`.
2. The timeseries form data type `EchartsTimeseriesFormData` in
`superset-frontend/plugins/plugin-chart-echarts/src/Timeseries/types.ts:54-73` models
these properties as `anomalyDetectionSeasonalityYearly: null | boolean | number`,
`anomalyDetectionSeasonalityWeekly: null | boolean | number`, and
`anomalyDetectionSeasonalityDaily: null | boolean | number` (lines 71-73), confirming that
`null` is part of the intended runtime shape for Prophet-based anomaly detection.
3. The anomaly detection operator at
`superset-frontend/packages/superset-ui-chart-controls/src/operators/anomalyDetectionOperator.ts:25-55`
builds a post-processing options object and, in the Prophet branch (lines 40-45), assigns
`options.yearly_seasonality = formData.anomalyDetectionSeasonalityYearly`,
`options.weekly_seasonality = formData.anomalyDetectionSeasonalityWeekly`, and
`options.daily_seasonality = formData.anomalyDetectionSeasonalityDaily`, meaning the
`options` payload can legitimately contain `null` values for these keys before being
returned as `PostProcessingAnomalyDetection` (lines 52-55).
4. However, the `PostProcessingAnomalyDetection` interface in
`superset-frontend/packages/superset-ui-core/src/query/types/PostProcessing.ts:138-149`
declares `yearly_seasonality?: boolean | number;`, `weekly_seasonality?: boolean |
number;`, and `daily_seasonality?: boolean | number;` (lines 146-148), excluding `null`,
while the backend implementation `anomaly_detection` in
`superset/utils/pandas_postprocessing/anomaly.py:194-271` accepts `Optional[Union[bool,
int]]` for these parameters (lines 201-203) and converts `None` to `"auto"` via
`_parse_seasonality` (lines 140-150); this mismatch forces the frontend to use unsafe type
assertions (`as PostProcessingAnomalyDetection`) even though `null` is a legitimate
payload value for the backend, and misleads other TypeScript consumers into thinking
`null` is not allowed.Fix in Cursor | Fix in VSCode Claude
(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** superset-frontend/packages/superset-ui-core/src/query/types/PostProcessing.ts
**Line:** 146:148
**Comment:**
*Api Mismatch: The seasonality option types do not include `null`, but the UI/operator flow uses `null` as a valid value to represent default Prophet behavior. This mismatch breaks the frontend type contract and forces unsafe casts even though `null` is a legitimate payload value for the backend. Include `null` in these unions to match actual runtime behavior.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix| if ( | ||
| method === 'prophet' && | ||
| !formData.granularity_sqla && | ||
| !formData.x_axis | ||
| ) { | ||
| return undefined; | ||
| } |
There was a problem hiding this comment.
Suggestion: The Prophet guard only checks whether an x-axis field exists, not whether it is actually temporal. If a non-datetime x_axis value slips through (for example from saved form state or API payload), this still emits a Prophet anomaly post-processing step and backend Prophet processing will fail when it uses datetime accessors on non-datetime data. Add a real temporal-type check for the selected x-axis before allowing method === 'prophet'. [logic error]
Severity Level: Major ⚠️
- ❌ Prophet anomaly detection fails for charts using non-temporal x_axis.
- ❌ Affected timeseries charts return errors instead of rendering.
- ⚠️ Users can misconfigure x_axis via unrestricted dndXAxisControl.Steps of Reproduction ✅
1. Configure an ECharts Timeseries Area chart whose control panel is defined in
`superset-frontend/plugins/plugin-chart-echarts/src/Timeseries/Area/controlPanel.tsx:24-30`,
which includes `sections.echartsTimeSeriesQueryWithXAxisSort` (for `x_axis` and
`time_grain_sqla`) and `sections.anomalyDetectionControls` (for anomaly detection form
fields from `src/sections/anomalyDetection.tsx:18-23,37-52`).
2. In the chart's Query section, choose a non-temporal column (for example a categorical
dimension like `customer_segment`) as `x_axis` using the `x_axis` control wired to
`dndXAxisControl`
(`packages/superset-ui-chart-controls/src/shared-controls/dndControls.tsx:293-296`), which
does not enforce `isTemporal` and allows arbitrary group-by columns.
3. In the Anomaly Detection section (`src/sections/anomalyDetection.tsx:18-23,37-52`),
enable anomaly detection (`anomalyDetectionEnabled = true`) and select `prophet` as
`anomalyDetectionMethod`, then run the chart so that the form data (including `x_axis:
'customer_segment'`, `anomalyDetectionEnabled: true`, `anomalyDetectionMethod: 'prophet'`)
is passed into the post-processing operator `anomalyDetectionOperator`
(`src/operators/anomalyDetectionOperator.ts:25-40`).
4. Observe that in `anomalyDetectionOperator` the Prophet guard only checks for the
*presence* of `granularity_sqla` or `x_axis`, and with `method === 'prophet'` and a
non-empty `x_axis` it still returns a `PostProcessingAnomalyDetection` object with
`method: 'prophet'` and `index` set to the non-temporal x-axis label (`getXAxisLabel` from
`packages/superset-ui-core/src/query/getXAxis.ts:45-48`). This post-processing spec
reaches the backend `anomaly_detection` implementation
(`superset/superset/utils/pandas_postprocessing/anomaly.py:194-224`), which documents that
it operates on a timeseries dataframe and uses the `index` column as x-axis data; when
Prophet-based anomaly detection runs against a non-datetime index column, downstream
Prophet code will attempt datetime-based operations on non-temporal data and fail, causing
the chart query to error instead of returning results.Fix in Cursor | Fix in VSCode Claude
(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** superset-frontend/packages/superset-ui-chart-controls/src/operators/anomalyDetectionOperator.ts
**Line:** 32:38
**Comment:**
*Logic Error: The Prophet guard only checks whether an x-axis field exists, not whether it is actually temporal. If a non-datetime `x_axis` value slips through (for example from saved form state or API payload), this still emits a Prophet anomaly post-processing step and backend Prophet processing will fail when it uses datetime accessors on non-datetime data. Add a real temporal-type check for the selected x-axis before allowing `method === 'prophet'`.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix| color: isAnomaly | ||
| ? (theme?.colorError ?? | ||
| colorScale(seriesKey || forecastSeries.name, sliceId)) | ||
| : timeShiftColor | ||
| ? colorScale(colorScaleKey, sliceId) | ||
| : colorScale(seriesKey || forecastSeries.name, sliceId), |
There was a problem hiding this comment.
Suggestion: The new anomaly color assignment does not actually guarantee red anomaly points when colorByPrimaryAxis is enabled, because this chart path bypasses series-level itemStyle and applies per-point colors instead. This breaks the intended behavior ("anomalies are red") in a valid chart configuration. Ensure anomaly series bypass applyColorByPrimaryAxis coloring (or force red in that branch) so anomaly markers stay red consistently. [logic error]
Severity Level: Major ⚠️
- ⚠️ Timeseries anomaly points not reliably highlighted in red.
- ⚠️ Color-by-X-axis charts misrepresent anomaly visual semantics.
- ⚠️ Users may overlook anomalies due to inconsistent coloring.Steps of Reproduction ✅
1. Configure a Time-series ECharts chart (e.g., Regular Line) that uses the Timeseries
plugin so that `transformProps()` at
`superset-frontend/plugins/plugin-chart-echarts/src/Timeseries/transformProps.ts:220-307`
is used to build the series list and call `transformSeries()` for each metric column.
2. In the chart control panel, enable anomaly detection so the backend produces
`metric__anomaly` columns (verified by tests in
`superset/tests/unit_tests/pandas_postprocessing/test_anomaly.py`) and enable the "Color
by X-axis / primary axis" option so `colorByPrimaryAxis` is `true` in the
`EchartsTimeseriesFormData` passed to `transformProps()` and forwarded into
`transformSeries()` (`transformProps.ts:256-293`).
3. At runtime, for the anomaly series, `transformSeries()` in
`superset-frontend/plugins/plugin-chart-echarts/src/Timeseries/transformers.ts:270-275`
sets `forecastSeries.type === ForecastSeriesEnum.Anomaly`, making `isAnomaly` true and
computing a red series-level `itemStyle.color` using `theme?.colorError` in the snippet at
lines 324-333; however, because `colorByPrimaryAxis` is true and `Array.isArray(data)` is
true, the returned series has its `data` replaced by the result of
`applyColorByPrimaryAxis()` (`transformers.ts:379-387`) and omits the `itemStyle` entirely
via `...(colorByPrimaryAxis ? {} : { itemStyle })` at line 397.
4. `applyColorByPrimaryAxis()` (`transformers.ts:172-195`) colors each point by its
primary-axis value using `colorScale(colorKey, sliceId)` and ignores anomaly-specific
styling, so in the rendered chart anomaly points are colored by X-axis category rather
than consistently red, diverging from the intended "red anomaly markers" behavior even
though the anomaly path is exercised through normal chart configuration.Fix in Cursor | Fix in VSCode Claude
(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** superset-frontend/plugins/plugin-chart-echarts/src/Timeseries/transformers.ts
**Line:** 325:330
**Comment:**
*Logic Error: The new anomaly color assignment does not actually guarantee red anomaly points when `colorByPrimaryAxis` is enabled, because this chart path bypasses series-level `itemStyle` and applies per-point colors instead. This breaks the intended behavior ("anomalies are red") in a valid chart configuration. Ensure anomaly series bypass `applyColorByPrimaryAxis` coloring (or force red in that branch) so anomaly markers stay red consistently.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix| if (name.endsWith(ForecastSeriesEnum.Anomaly)) { | ||
| const stripped = name.slice(0, -ForecastSeriesEnum.Anomaly.length); | ||
| const forecastMatch = forecastSuffixRegex.exec(stripped); | ||
| return { | ||
| name: forecastMatch ? forecastMatch[1] : stripped, | ||
| type: ForecastSeriesEnum.Anomaly, | ||
| }; |
There was a problem hiding this comment.
Suggestion: The anomaly parsing now collapses nested forecast anomaly series (for example metric__yhat__anomaly) to the same base key as regular anomaly series (metric__anomaly). When both exist (such as forecast enabled with zero future periods), tooltip aggregation merges them into one entry and overwrites values/colors, producing incorrect tooltip output. Preserve distinct keys for nested anomaly series (e.g., keep stripped as the name) and only normalize for display later. [logic error]
Severity Level: Major ⚠️
- ⚠️ Timeseries charts can drop one of two anomaly series.
- ⚠️ Tooltips cannot distinguish original vs forecast anomalies.
- ⚠️ Users may misinterpret which values were anomalous.Steps of Reproduction ✅
1. Enable Prophet forecast and anomaly detection together for a timeseries metric in a
chart that uses the ECharts Timeseries plugin; configure forecast so that the original
metric has no NaNs in the returned dataframe (e.g., zero or no future periods). Backend
tests in
`superset/tests/unit_tests/pandas_postprocessing/test_anomaly.py:test_anomaly_detection_with_forecast`
confirm this produces both `metric__anomaly` and `metric__yhat__anomaly` columns.
2. On the frontend, `transformProps()` in
`superset-frontend/plugins/plugin-chart-echarts/src/Timeseries/transformProps.ts:73-75`
calls `rebaseForecastDatum(data, verboseMap)`
(`superset-frontend/plugins/plugin-chart-echarts/src/utils/forecast.ts:138-170`). Inside
`rebaseForecastDatum`, each column key is passed through `extractForecastSeriesContext()`
(`forecast.ts:32-53`); for both `metric__anomaly` and `metric__yhat__anomaly`, the code at
lines 38-44 detects the `__anomaly` suffix, strips it, optionally strips any `__yhat*`
suffix via `forecastSuffixRegex`, and returns `ForecastSeriesContext { name: 'metric',
type: ForecastSeriesEnum.Anomaly }` for both columns.
3. Still in `rebaseForecastDatum`, when `verboseMap` has an entry for the base metric key
`'metric'`, both anomaly columns compute the same `verboseKey`
`${verboseMap['metric']}${ForecastSeriesEnum.Anomaly}` (`forecast.ts:147-151`). The second
anomaly column processed (typically `metric__yhat__anomaly`, as in the backend test column
order) overwrites the first in `newRow`, so only one anomaly column remains in the rebased
data and the other anomaly series' values are silently lost before `extractSeries()`
builds the chart series
(`superset-frontend/plugins/plugin-chart-echarts/src/utils/series.ts:70-103`).
4. When hovering the chart, the tooltip formatter in
`superset-frontend/plugins/plugin-chart-echarts/src/Timeseries/transformProps.ts:1015-1079`
calls `extractForecastValuesFromTooltipParams()` (`forecast.ts:68-97`), which again uses
`extractForecastSeriesContext(seriesId)` and aggregates by `context.name`. Because both
base and forecast anomaly series are normalized to the same `name` `'metric'`, any
remaining cases where both anomaly series survive (e.g., if `verboseMap` lacked the base
key or in other consumers such as `MixedTimeseries/transformProps.ts:790-797`) will be
merged into a single `ForecastValue` entry, causing one series' anomaly value/color to
overwrite the other and preventing tooltips from distinguishing base vs forecast
anomalies.Fix in Cursor | Fix in VSCode Claude
(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** superset-frontend/plugins/plugin-chart-echarts/src/utils/forecast.ts
**Line:** 38:44
**Comment:**
*Logic Error: The anomaly parsing now collapses nested forecast anomaly series (for example `metric__yhat__anomaly`) to the same base key as regular anomaly series (`metric__anomaly`). When both exist (such as forecast enabled with zero future periods), tooltip aggregation merges them into one entry and overwrites values/colors, producing incorrect tooltip output. Preserve distinct keys for nested anomaly series (e.g., keep `stripped` as the name) and only normalize for display later.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix
✅ Deploy Preview for superset-docs-preview ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
SUMMARY
Adds built-in anomaly detection as a post-processing operation for all ECharts Timeseries chart types (Line, Bar, Area, Scatter, Smooth Line, Step Line). Related: #40520
SIP: #40520
This feature highlights statistical anomalies directly on timeseries charts as red scatter points, enabling instant visual identification of outliers without exporting data.
Architecture -> follows the same pattern as the existing Prophet forecast feature:
anomaly_detectionentry inpost_processingarray{column}__anomalycolumns__anomalysuffix and renders red scatter pointsDetection methods:
Forecast integration: When both Prophet forecast and anomaly detection are enabled, anomaly detection automatically runs on the forecast
__yhatcolumns (which have complete data), skipping original columns that contain NaN for future periods.BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
TESTING INSTRUCTIONS
Backend:
docker exec superset-superset-1 python -m pytest tests/unit_tests/pandas_postprocessing/test_anomaly.py -vRuns 13 tests covering: zscore, mad, prophet, invalid inputs, forecast column handling, null column skipping.
Frontend:
Runs 7 operator tests + 17 forecast utility tests (including anomaly indicator).
Manual verification:
ADDITIONAL INFORMATION
NOTE
The program was tested solely for our own use cases, which might differ from yours.
Signed-off-by: Ayush Raj ayush.raj@mercedes-benz.com on behalf of Mercedes-Benz Research and Development India
https://github.com/mercedes-benz/mercedes-benz-foss/blob/master/PROVIDER_INFORMATION.md