Optional final production model fitting procedure after CV#221
Merged
PascalIversen merged 57 commits intodevelopmentfrom Jun 11, 2025
Merged
Optional final production model fitting procedure after CV#221PascalIversen merged 57 commits intodevelopmentfrom
PascalIversen merged 57 commits intodevelopmentfrom
Conversation
… type. I always do typos like induividual
…handle this in experiment.py
Contributor
There was a problem hiding this comment.
Pull Request Overview
This PR introduces an optional final production model fitting procedure after cross‐validation and enhances model persistence by adding save/load methods across multiple model implementations. Key changes include:
- Adding a new command-line flag and corresponding logic in the experiment pipeline to train a final model on the full dataset.
- Implementing and testing save/load functionality in various model modules.
- Introducing tissue mapping tests and minor type hint/documentation improvements.
Reviewed Changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| tests/test_tissue_mapping.py | Added tests for tissue mapping functionality with dummy data. |
| tests/models/test_single_drug_models.py | Added save/load tests for single drug models to verify persistence. |
| tests/models/test_global_models.py | Extended global model tests with save/load functionality. |
| tests/models/test_baselines.py | Added persistence tests for baseline models. |
| drevalpy/utils.py | Introduced a new flag (--final_model_on_full_data) for final model training. |
| drevalpy/models/drp_model.py | Added abstract save/load methods in the base model class. |
| drevalpy/models/baselines/singledrug_elastic_net.py | Implemented save/load functionality for the ElasticNet baseline. |
| drevalpy/models/SimpleNeuralNetwork/*.py | Added save/load methods and updated hyperparameter defaults in neural network models. |
| drevalpy/models/SRMF/srmf.py | Added methods to save and load the SRMF model parameters/configurations. |
| drevalpy/models/DIPK/dipk.py | Updated DIPK training to use hyperparameters and added save/load support. |
| drevalpy/experiment.py | Extended experiment pipeline to include a final model training step and added a new splitting utility. |
| drevalpy/datasets/map_tissues.py | Refined type hints and ensured proper conversion of file path objects. |
Comments suppressed due to low confidence (1)
drevalpy/models/baselines/singledrug_elastic_net.py:1
- Typo in docstring: 'seperately' should be corrected to 'separately'.
"""SingleDrugElasticNet and SingleDrugProteomicsElasticNet classes. Fit an Elastic net for each drug seperately."""
Comment on lines
+81
to
+97
| print("\n=== tissue_mapping.csv ===") | ||
| print(df_out) | ||
|
|
||
| assert "tissue" in df_out.columns | ||
| print("\n=== tissue for CVCL_TEST1 ===") | ||
| print(df_out.loc[df_out["cellosaurus_id"] == "CVCL_TEST1", "tissue"]) | ||
|
|
||
| print("\n=== tissue for CVCL_TEST2 ===") | ||
| print(df_out.loc[df_out["cellosaurus_id"] == "CVCL_TEST2", "tissue"]) | ||
|
|
||
| assert df_out.loc[df_out["cellosaurus_id"] == "CVCL_TEST1", "tissue"].values[0] == "Lung" | ||
| assert df_out.loc[df_out["cellosaurus_id"] == "CVCL_TEST2", "tissue"].values[0] == "Blood" | ||
|
|
||
| # Check the updated dataset has a tissue column | ||
| updated = pd.read_csv(root / ds_name / f"{ds_name}.csv") | ||
| print("\n=== Updated dataset ===") | ||
| print(updated) |
There was a problem hiding this comment.
[nitpick] Consider using a logging mechanism instead of print statements in tests to reduce console clutter during automated runs.
Suggested change
| print("\n=== tissue_mapping.csv ===") | |
| print(df_out) | |
| assert "tissue" in df_out.columns | |
| print("\n=== tissue for CVCL_TEST1 ===") | |
| print(df_out.loc[df_out["cellosaurus_id"] == "CVCL_TEST1", "tissue"]) | |
| print("\n=== tissue for CVCL_TEST2 ===") | |
| print(df_out.loc[df_out["cellosaurus_id"] == "CVCL_TEST2", "tissue"]) | |
| assert df_out.loc[df_out["cellosaurus_id"] == "CVCL_TEST1", "tissue"].values[0] == "Lung" | |
| assert df_out.loc[df_out["cellosaurus_id"] == "CVCL_TEST2", "tissue"].values[0] == "Blood" | |
| # Check the updated dataset has a tissue column | |
| updated = pd.read_csv(root / ds_name / f"{ds_name}.csv") | |
| print("\n=== Updated dataset ===") | |
| print(updated) | |
| logging.info("\n=== tissue_mapping.csv ===") | |
| logging.info(df_out) | |
| assert "tissue" in df_out.columns | |
| logging.info("\n=== tissue for CVCL_TEST1 ===") | |
| logging.info(df_out.loc[df_out["cellosaurus_id"] == "CVCL_TEST1", "tissue"]) | |
| logging.info("\n=== tissue for CVCL_TEST2 ===") | |
| logging.info(df_out.loc[df_out["cellosaurus_id"] == "CVCL_TEST2", "tissue"]) | |
| assert df_out.loc[df_out["cellosaurus_id"] == "CVCL_TEST1", "tissue"].values[0] == "Lung" | |
| assert df_out.loc[df_out["cellosaurus_id"] == "CVCL_TEST2", "tissue"].values[0] == "Blood" | |
| # Check the updated dataset has a tissue column | |
| updated = pd.read_csv(root / ds_name / f"{ds_name}.csv") | |
| logging.info("\n=== Updated dataset ===") | |
| logging.info(updated) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
closes #217
I decided for a proper retuning on the full dataset instead of any hacks. (will approx. take as long as one fold)
also sneaking in tissue mapping tests which I forgot to add.
TODO:
testing