Skip to content

Optional final production model fitting procedure after CV#221

Merged
PascalIversen merged 57 commits intodevelopmentfrom
final_model
Jun 11, 2025
Merged

Optional final production model fitting procedure after CV#221
PascalIversen merged 57 commits intodevelopmentfrom
final_model

Conversation

@PascalIversen
Copy link
Collaborator

@PascalIversen PascalIversen commented May 25, 2025

closes #217
I decided for a proper retuning on the full dataset instead of any hacks. (will approx. take as long as one fold)

also sneaking in tissue mapping tests which I forgot to add.

TODO:
testing

@PascalIversen PascalIversen added the enhancement New feature or request label May 25, 2025
@PascalIversen PascalIversen requested a review from Copilot May 27, 2025 14:05
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces an optional final production model fitting procedure after cross‐validation and enhances model persistence by adding save/load methods across multiple model implementations. Key changes include:

  • Adding a new command-line flag and corresponding logic in the experiment pipeline to train a final model on the full dataset.
  • Implementing and testing save/load functionality in various model modules.
  • Introducing tissue mapping tests and minor type hint/documentation improvements.

Reviewed Changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/test_tissue_mapping.py Added tests for tissue mapping functionality with dummy data.
tests/models/test_single_drug_models.py Added save/load tests for single drug models to verify persistence.
tests/models/test_global_models.py Extended global model tests with save/load functionality.
tests/models/test_baselines.py Added persistence tests for baseline models.
drevalpy/utils.py Introduced a new flag (--final_model_on_full_data) for final model training.
drevalpy/models/drp_model.py Added abstract save/load methods in the base model class.
drevalpy/models/baselines/singledrug_elastic_net.py Implemented save/load functionality for the ElasticNet baseline.
drevalpy/models/SimpleNeuralNetwork/*.py Added save/load methods and updated hyperparameter defaults in neural network models.
drevalpy/models/SRMF/srmf.py Added methods to save and load the SRMF model parameters/configurations.
drevalpy/models/DIPK/dipk.py Updated DIPK training to use hyperparameters and added save/load support.
drevalpy/experiment.py Extended experiment pipeline to include a final model training step and added a new splitting utility.
drevalpy/datasets/map_tissues.py Refined type hints and ensured proper conversion of file path objects.
Comments suppressed due to low confidence (1)

drevalpy/models/baselines/singledrug_elastic_net.py:1

  • Typo in docstring: 'seperately' should be corrected to 'separately'.
"""SingleDrugElasticNet and SingleDrugProteomicsElasticNet classes. Fit an Elastic net for each drug seperately."""

Comment on lines +81 to +97
print("\n=== tissue_mapping.csv ===")
print(df_out)

assert "tissue" in df_out.columns
print("\n=== tissue for CVCL_TEST1 ===")
print(df_out.loc[df_out["cellosaurus_id"] == "CVCL_TEST1", "tissue"])

print("\n=== tissue for CVCL_TEST2 ===")
print(df_out.loc[df_out["cellosaurus_id"] == "CVCL_TEST2", "tissue"])

assert df_out.loc[df_out["cellosaurus_id"] == "CVCL_TEST1", "tissue"].values[0] == "Lung"
assert df_out.loc[df_out["cellosaurus_id"] == "CVCL_TEST2", "tissue"].values[0] == "Blood"

# Check the updated dataset has a tissue column
updated = pd.read_csv(root / ds_name / f"{ds_name}.csv")
print("\n=== Updated dataset ===")
print(updated)
Copy link

Copilot AI May 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Consider using a logging mechanism instead of print statements in tests to reduce console clutter during automated runs.

Suggested change
print("\n=== tissue_mapping.csv ===")
print(df_out)
assert "tissue" in df_out.columns
print("\n=== tissue for CVCL_TEST1 ===")
print(df_out.loc[df_out["cellosaurus_id"] == "CVCL_TEST1", "tissue"])
print("\n=== tissue for CVCL_TEST2 ===")
print(df_out.loc[df_out["cellosaurus_id"] == "CVCL_TEST2", "tissue"])
assert df_out.loc[df_out["cellosaurus_id"] == "CVCL_TEST1", "tissue"].values[0] == "Lung"
assert df_out.loc[df_out["cellosaurus_id"] == "CVCL_TEST2", "tissue"].values[0] == "Blood"
# Check the updated dataset has a tissue column
updated = pd.read_csv(root / ds_name / f"{ds_name}.csv")
print("\n=== Updated dataset ===")
print(updated)
logging.info("\n=== tissue_mapping.csv ===")
logging.info(df_out)
assert "tissue" in df_out.columns
logging.info("\n=== tissue for CVCL_TEST1 ===")
logging.info(df_out.loc[df_out["cellosaurus_id"] == "CVCL_TEST1", "tissue"])
logging.info("\n=== tissue for CVCL_TEST2 ===")
logging.info(df_out.loc[df_out["cellosaurus_id"] == "CVCL_TEST2", "tissue"])
assert df_out.loc[df_out["cellosaurus_id"] == "CVCL_TEST1", "tissue"].values[0] == "Lung"
assert df_out.loc[df_out["cellosaurus_id"] == "CVCL_TEST2", "tissue"].values[0] == "Blood"
# Check the updated dataset has a tissue column
updated = pd.read_csv(root / ds_name / f"{ds_name}.csv")
logging.info("\n=== Updated dataset ===")
logging.info(updated)

Copilot uses AI. Check for mistakes.
@PascalIversen PascalIversen marked this pull request as ready for review May 27, 2025 14:10
@PascalIversen PascalIversen requested review from JudithBernett and removed request for JudithBernett May 28, 2025 14:55
Copy link
Contributor

@JudithBernett JudithBernett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@github-actions github-actions bot added the chore label Jun 10, 2025
@PascalIversen PascalIversen merged commit b7bde54 into development Jun 11, 2025
32 of 33 checks passed
@JudithBernett JudithBernett deleted the final_model branch June 11, 2025 11:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

chore enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Final Model Training

3 participants