Open
Conversation
Removed z-score calculations and related outlier flags.
Implement is_outlier function to detect anomalies using IQR method.
GregW04
reviewed
Feb 8, 2026
| @@ -0,0 +1,40 @@ | |||
| def is_outlier(df, groupby, filter_column=None, window='7D',z_score_sensitivity=2) -> pd.DataFrame: #takes as input: dataframe, groupby - for eg. spaces_left, filter_columns-used to filter by parking_id, window- for eg. uses last 30D for calculating anomalies, returns updated dataframe | |||
There was a problem hiding this comment.
- Add markings as df: pd.DataFrame etc.
- Change name
groupbyname to something more intuitive likegroup_cols(list of string format to be able ingest more than one column) - add Google-Styled doctring
- add mindful comments across all function
- We are not using
z_score_sensitivityor any of c=2, min_t, max_t (delete unused code or add n_sigma method)
GregW04
reviewed
Feb 8, 2026
| @@ -0,0 +1,40 @@ | |||
| def is_outlier(df, groupby, filter_column=None, window='7D',z_score_sensitivity=2) -> pd.DataFrame: #takes as input: dataframe, groupby - for eg. spaces_left, filter_columns-used to filter by parking_id, window- for eg. uses last 30D for calculating anomalies, returns updated dataframe | |||
| df=df.copy() | |||
There was a problem hiding this comment.
add bool parameter to control if we want to create copy (as it costs as Memory)
GregW04
reviewed
Feb 8, 2026
|
|
||
| if filter_column is None: | ||
| temp_dataframe=df[groupby] | ||
|
|
GregW04
reviewed
Feb 8, 2026
|
|
||
| Q1 = temp_dataframe.transform(lambda x: x.rolling(window, min_periods=1).quantile(0.25)) | ||
| Q3 = temp_dataframe.transform(lambda x: x.rolling(window, min_periods=1).quantile(0.75)) | ||
|
|
There was a problem hiding this comment.
groupby.transform(lambda x: x.rolling(...).quantile()) is inefficient: computes rolling twice per quantile, broadcasts results awkwardly, and scales poorly (O(n log n) per group). -> propose more scalable solution
GregW04
reviewed
Feb 8, 2026
|
|
||
| lower_bound = Q1 - 1.5 * IQR | ||
| upper_bound = Q3 + 1.5 * IQR | ||
|
|
There was a problem hiding this comment.
replace 1.5 with multipler parameter which could tune our IQR method
GregW04
reviewed
Feb 8, 2026
|
|
||
| df['is_event_iqr_outlier']=df['is_outlier_iqr'] & df['is_event'] | ||
|
|
||
|
|
There was a problem hiding this comment.
Whole function takes already processed dataframe I assume. We need to be able to run feature_engineering functions in order so:
- make this function take needed raw dataframes and transform them inside
- or add new function that will transform dataframes (before) using this function
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.