Skip to content

Comments

Experiment/mateusz anomaly detection#17

Open
matis2303 wants to merge 4 commits intodevfrom
experiment/mateusz_anomaly_detection
Open

Experiment/mateusz anomaly detection#17
matis2303 wants to merge 4 commits intodevfrom
experiment/mateusz_anomaly_detection

Conversation

@matis2303
Copy link
Member

No description provided.

Removed z-score calculations and related outlier flags.
Implement is_outlier function to detect anomalies using IQR method.
@@ -0,0 +1,40 @@
def is_outlier(df, groupby, filter_column=None, window='7D',z_score_sensitivity=2) -> pd.DataFrame: #takes as input: dataframe, groupby - for eg. spaces_left, filter_columns-used to filter by parking_id, window- for eg. uses last 30D for calculating anomalies, returns updated dataframe
Copy link

@GregW04 GregW04 Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Add markings as df: pd.DataFrame etc.
  2. Change name groupby name to something more intuitive like group_cols (list of string format to be able ingest more than one column)
  3. add Google-Styled doctring
  4. add mindful comments across all function
  5. We are not using z_score_sensitivity or any of c=2, min_t, max_t (delete unused code or add n_sigma method)

@@ -0,0 +1,40 @@
def is_outlier(df, groupby, filter_column=None, window='7D',z_score_sensitivity=2) -> pd.DataFrame: #takes as input: dataframe, groupby - for eg. spaces_left, filter_columns-used to filter by parking_id, window- for eg. uses last 30D for calculating anomalies, returns updated dataframe
df=df.copy()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add bool parameter to control if we want to create copy (as it costs as Memory)


if filter_column is None:
temp_dataframe=df[groupby]

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

group_cols fix should sort this


Q1 = temp_dataframe.transform(lambda x: x.rolling(window, min_periods=1).quantile(0.25))
Q3 = temp_dataframe.transform(lambda x: x.rolling(window, min_periods=1).quantile(0.75))

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

groupby.transform(lambda x: x.rolling(...).quantile()) is inefficient: computes rolling twice per quantile, broadcasts results awkwardly, and scales poorly (O(n log n) per group). -> propose more scalable solution


lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace 1.5 with multipler parameter which could tune our IQR method


df['is_event_iqr_outlier']=df['is_outlier_iqr'] & df['is_event']


Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whole function takes already processed dataframe I assume. We need to be able to run feature_engineering functions in order so:

  • make this function take needed raw dataframes and transform them inside
  • or add new function that will transform dataframes (before) using this function

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants