-
Notifications
You must be signed in to change notification settings - Fork 0
Metrics update #48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Metrics update #48
Conversation
justinbt1
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick review as I know you are still working on this, everything appears to be working correctly. However there are a few bits where the code might need cleaning up, see the comments below.
| train_meta = pd.DataFrame(index=np.arange(len(train))) | ||
| test_meta = pd.DataFrame(index=np.arange(len(test))) | ||
|
|
||
| return train, test, train_meta, test_meta |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is very long and does quite a few different tasks, could it be split into separate functions?
See https://ds.wellcome.data/python_standards/#functions for details and reasoning.
| df = df.sample(frac=1, random_state=10).reset_index(drop=True) | ||
| df["_row_id"] = np.arange(len(df), dtype=np.int64) | ||
|
|
||
| # Labels (multilabel binarized) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this may be obvious from below Class and variable names, consider removing this comment?
Think of comments as simply another line of code we have to maintain.
See https://ds.wellcome.data/python_standards/#comments
|
|
||
| return all_metrics | ||
|
|
||
| return compute_metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is very long and will need to be refactored, more details on function sizing and why it's important can be found in the Python Language Rules section of our code standards.
| } | ||
| return metrics | ||
|
|
||
| return compute_metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving metric calculation to another module is a really good idea!
| python src/train_test_split.py \ | ||
| "config/train_config.yaml" \ | ||
| "data/clean/clean.parquet" \ | ||
| "data/preprocessed" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change of this target name also needs to be updated in associated documentation.
| wandb.log({"wellcome_metrics_status": "funding_org_missing"}) | ||
| elif len(meta_df) != labels.shape[0]: | ||
| print("Info: test_meta length does not match eval set; skipping Wellcome metrics.") | ||
| wandb.log({"wellcome_metrics_status": "length_mismatch"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of interest, why do we have so many edge cases?
| num_tags_predicted = [] | ||
| if config["training_settings"]["output_weighting"]: | ||
| thresholds = [1] * labels.shape[1] | ||
| output_thresholds = config["training_settings"].get("output_thresholds", [0.2, 0.5, 0.8, 0.95]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above comment on line width.
| wandb.log({"wellcome_metrics_status": "computed"}) | ||
| else: | ||
| print("Info: No rows for FundingOrganisation == 'Wellcome Trust' in eval set") | ||
| wandb.log({"wellcome_metrics_status": "no_wellcome_rows"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could the content of this try except block be moved to another function? This would also avoid the need to nest a large block of code.
| train (pd.DataFrame): Training data. | ||
| test (pd.DataFrame): Test data. | ||
| output_dir (str): Output directory. | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The train_meta and test_meta parameters are missing from the docstring.
|
|
||
| # Randomly shuffle dataframe and reset index; add a stable row id | ||
| df = df.sample(frac=1, random_state=10).reset_index(drop=True) | ||
| df["_row_id"] = np.arange(len(df), dtype=np.int64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be missing something but could the cleaned dataset index not be used as the values for the new stable row id column? This could allow for easier merging with original dataset for metric calculation without needing to create intermediate meta datasets? Might reduce the number of datasets and code to handle create, read and handle them.
What does this change?
This PR adds in a few things:
preprocess.pytotrain_test_split.py. It felt to me that most of the preprocessing is now done in src/data_processing/* and the main utility of the oldpreprocess.pywas just to split the data in train and testmetrics.pyfile where all metrics are calculated"1.1: Normal biological development and functioning (underpinning)"maps to"1: Underpinning research"How to test
make train_test_splitandmake train_raHow can we measure success?
Success in this case if the code does what it needs to do and if the metrics we care about (listed above) are logged in wandb
Have we considered potential risks?
The classifier might be considered worse for "Wellcome only" in the test set. The onus of explaining this is on us