Experiment/mateusz anomaly detection by matis2303 · Pull Request #17 · Solvro/ml-parking-forecasting

matis2303 · 2026-02-08T12:41:06Z

No description provided.

Removed z-score calculations and related outlier flags.

Implement is_outlier function to detect anomalies using IQR method.

GregW04 · 2026-02-08T16:28:40Z

src/feature_engineering/anomaly_detection.py

@@ -0,0 +1,40 @@
+def is_outlier(df, groupby, filter_column=None, window='7D',z_score_sensitivity=2) -> pd.DataFrame:  #takes as input: dataframe, groupby - for eg. spaces_left, filter_columns-used to filter by parking_id, window- for eg. uses last 30D for calculating anomalies, returns updated dataframe


Add markings as df: pd.DataFrame etc.

Change name groupby name to something more intuitive like group_cols (list of string format to be able ingest more than one column)

add Google-Styled doctring

add mindful comments across all function

We are not using z_score_sensitivity or any of c=2, min_t, max_t (delete unused code or add n_sigma method)

GregW04 · 2026-02-08T16:29:27Z

src/feature_engineering/anomaly_detection.py

@@ -0,0 +1,40 @@
+def is_outlier(df, groupby, filter_column=None, window='7D',z_score_sensitivity=2) -> pd.DataFrame:  #takes as input: dataframe, groupby - for eg. spaces_left, filter_columns-used to filter by parking_id, window- for eg. uses last 30D for calculating anomalies, returns updated dataframe
+  df=df.copy()


add bool parameter to control if we want to create copy (as it costs as Memory)

GregW04 · 2026-02-08T16:30:30Z

src/feature_engineering/anomaly_detection.py

+
+  if filter_column is None:
+      temp_dataframe=df[groupby]
+


group_cols fix should sort this

GregW04 · 2026-02-08T16:32:03Z

src/feature_engineering/anomaly_detection.py

+
+  Q1 = temp_dataframe.transform(lambda x: x.rolling(window, min_periods=1).quantile(0.25))
+  Q3 = temp_dataframe.transform(lambda x: x.rolling(window, min_periods=1).quantile(0.75))
+


groupby.transform(lambda x: x.rolling(...).quantile()) is inefficient: computes rolling twice per quantile, broadcasts results awkwardly, and scales poorly (O(n log n) per group). -> propose more scalable solution

GregW04 · 2026-02-08T16:33:40Z

src/feature_engineering/anomaly_detection.py

+
+  lower_bound = Q1 - 1.5 * IQR
+  upper_bound = Q3 + 1.5 * IQR
+


replace 1.5 with multipler parameter which could tune our IQR method

GregW04 · 2026-02-08T16:42:50Z

src/feature_engineering/anomaly_detection.py

+
+  df['is_event_iqr_outlier']=df['is_outlier_iqr'] & df['is_event']
+
+


Whole function takes already processed dataframe I assume. We need to be able to run feature_engineering functions in order so:

make this function take needed raw dataframes and transform them inside

or add new function that will transform dataframes (before) using this function

matis2303 added 4 commits February 8, 2026 13:19

add: anomaly_detection

7ca8f53

Removed z-score calculations and related outlier flags.

remove: feature_engineering wrong directory

9f9ab42

add: add anomaly_detection file

a51e987

Implement is_outlier function to detect anomalies using IQR method.

add: add anomaly_detection notebook

be1fc4a

matis2303 requested review from DominikaStefaniak and GregW04 February 8, 2026 12:41

pull-request-size bot added the size/XXL label Feb 8, 2026

matis2303 mentioned this pull request Feb 8, 2026

(Feature Engineering) Events anomaly detection #8

Open

GregW04 reviewed Feb 8, 2026

View reviewed changes

src/feature_engineering/anomaly_detection.py

if filter_column is None:

temp_dataframe=df[groupby]

Copy link

GregW04 Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

group_cols fix should sort this

GregW04 reviewed Feb 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Experiment/mateusz anomaly detection#17

Experiment/mateusz anomaly detection#17
matis2303 wants to merge 4 commits intodevfrom
experiment/mateusz_anomaly_detection

matis2303 commented Feb 8, 2026

Uh oh!

GregW04 Feb 8, 2026 •

edited

Loading

Uh oh!

GregW04 Feb 8, 2026

Uh oh!

GregW04 Feb 8, 2026

Uh oh!

GregW04 Feb 8, 2026

Uh oh!

GregW04 Feb 8, 2026

Uh oh!

GregW04 Feb 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -0,0 +1,40 @@
		def is_outlier(df, groupby, filter_column=None, window='7D',z_score_sensitivity=2) -> pd.DataFrame: #takes as input: dataframe, groupby - for eg. spaces_left, filter_columns-used to filter by parking_id, window- for eg. uses last 30D for calculating anomalies, returns updated dataframe

		@@ -0,0 +1,40 @@
		def is_outlier(df, groupby, filter_column=None, window='7D',z_score_sensitivity=2) -> pd.DataFrame: #takes as input: dataframe, groupby - for eg. spaces_left, filter_columns-used to filter by parking_id, window- for eg. uses last 30D for calculating anomalies, returns updated dataframe
		df=df.copy()


		Q1 = temp_dataframe.transform(lambda x: x.rolling(window, min_periods=1).quantile(0.25))
		Q3 = temp_dataframe.transform(lambda x: x.rolling(window, min_periods=1).quantile(0.75))


		df['is_event_iqr_outlier']=df['is_outlier_iqr'] & df['is_event']

Comments

Conversation

matis2303 commented Feb 8, 2026

Uh oh!

GregW04 Feb 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

GregW04 Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

GregW04 Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

GregW04 Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

GregW04 Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

GregW04 Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

GregW04 Feb 8, 2026 •

edited

Loading