Optional final production model fitting procedure after CV by PascalIversen · Pull Request #221 · daisybio/drevalpy

PascalIversen · 2025-05-25T19:23:19Z

closes #217
I decided for a proper retuning on the full dataset instead of any hacks. (will approx. take as long as one fold)

also sneaking in tissue mapping tests which I forgot to add.

TODO:
testing

… type. I always do typos like induividual

…handle this in experiment.py

Copilot

Pull Request Overview

This PR introduces an optional final production model fitting procedure after cross‐validation and enhances model persistence by adding save/load methods across multiple model implementations. Key changes include:

Adding a new command-line flag and corresponding logic in the experiment pipeline to train a final model on the full dataset.
Implementing and testing save/load functionality in various model modules.
Introducing tissue mapping tests and minor type hint/documentation improvements.

Reviewed Changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
tests/test_tissue_mapping.py	Added tests for tissue mapping functionality with dummy data.
tests/models/test_single_drug_models.py	Added save/load tests for single drug models to verify persistence.
tests/models/test_global_models.py	Extended global model tests with save/load functionality.
tests/models/test_baselines.py	Added persistence tests for baseline models.
drevalpy/utils.py	Introduced a new flag (--final_model_on_full_data) for final model training.
drevalpy/models/drp_model.py	Added abstract save/load methods in the base model class.
drevalpy/models/baselines/singledrug_elastic_net.py	Implemented save/load functionality for the ElasticNet baseline.
drevalpy/models/SimpleNeuralNetwork/*.py	Added save/load methods and updated hyperparameter defaults in neural network models.
drevalpy/models/SRMF/srmf.py	Added methods to save and load the SRMF model parameters/configurations.
drevalpy/models/DIPK/dipk.py	Updated DIPK training to use hyperparameters and added save/load support.
drevalpy/experiment.py	Extended experiment pipeline to include a final model training step and added a new splitting utility.
drevalpy/datasets/map_tissues.py	Refined type hints and ensured proper conversion of file path objects.

Comments suppressed due to low confidence (1)

drevalpy/models/baselines/singledrug_elastic_net.py:1

Typo in docstring: 'seperately' should be corrected to 'separately'.

"""SingleDrugElasticNet and SingleDrugProteomicsElasticNet classes. Fit an Elastic net for each drug seperately."""

Copilot · 2025-05-27T14:05:48Z

tests/test_tissue_mapping.py

+    print("\n=== tissue_mapping.csv ===")
+    print(df_out)
+
+    assert "tissue" in df_out.columns
+    print("\n=== tissue for CVCL_TEST1 ===")
+    print(df_out.loc[df_out["cellosaurus_id"] == "CVCL_TEST1", "tissue"])
+
+    print("\n=== tissue for CVCL_TEST2 ===")
+    print(df_out.loc[df_out["cellosaurus_id"] == "CVCL_TEST2", "tissue"])
+
+    assert df_out.loc[df_out["cellosaurus_id"] == "CVCL_TEST1", "tissue"].values[0] == "Lung"
+    assert df_out.loc[df_out["cellosaurus_id"] == "CVCL_TEST2", "tissue"].values[0] == "Blood"
+
+    # Check the updated dataset has a tissue column
+    updated = pd.read_csv(root / ds_name / f"{ds_name}.csv")
+    print("\n=== Updated dataset ===")
+    print(updated)


[nitpick] Consider using a logging mechanism instead of print statements in tests to reduce console clutter during automated runs.

Suggested change

print("\n=== tissue_mapping.csv ===")

print(df_out)

assert "tissue" in df_out.columns

print("\n=== tissue for CVCL_TEST1 ===")

print(df_out.loc[df_out["cellosaurus_id"] == "CVCL_TEST1", "tissue"])

print("\n=== tissue for CVCL_TEST2 ===")

print(df_out.loc[df_out["cellosaurus_id"] == "CVCL_TEST2", "tissue"])

assert df_out.loc[df_out["cellosaurus_id"] == "CVCL_TEST1", "tissue"].values[0] == "Lung"

assert df_out.loc[df_out["cellosaurus_id"] == "CVCL_TEST2", "tissue"].values[0] == "Blood"

# Check the updated dataset has a tissue column

updated = pd.read_csv(root / ds_name / f"{ds_name}.csv")

print("\n=== Updated dataset ===")

print(updated)

logging.info("\n=== tissue_mapping.csv ===")

logging.info(df_out)

assert "tissue" in df_out.columns

logging.info("\n=== tissue for CVCL_TEST1 ===")

logging.info(df_out.loc[df_out["cellosaurus_id"] == "CVCL_TEST1", "tissue"])

logging.info("\n=== tissue for CVCL_TEST2 ===")

logging.info(df_out.loc[df_out["cellosaurus_id"] == "CVCL_TEST2", "tissue"])

assert df_out.loc[df_out["cellosaurus_id"] == "CVCL_TEST1", "tissue"].values[0] == "Lung"

assert df_out.loc[df_out["cellosaurus_id"] == "CVCL_TEST2", "tissue"].values[0] == "Blood"

# Check the updated dataset has a tissue column

updated = pd.read_csv(root / ds_name / f"{ds_name}.csv")

logging.info("\n=== Updated dataset ===")

logging.info(updated)

JudithBernett

lgtm

…_model

first commit final fittign procedure

779d98b

PascalIversen added the enhancement New feature or request label May 25, 2025

PascalIversen added 16 commits May 26, 2025 10:36

model.save

1de6b2b

load save snn

54457cf

srmf save

9205b97

dipk save

6a5a355

from abstract to not implemented to make contributing easier

72c2cfa

mypy

faa4a51

cls method load

b3f110d

mypy fix

f11954d

change test folder name, because I do not like it and its annoying to…

6778bdc

… type. I always do typos like induividual

tests

91bbe42

sdproteomics enet

e2e4c9e

tests

3b3d8a9

some dipk refactoring

b8e933f

cant save for test case with empty train data. Probably also need to …

1a4ea76

…handle this in experiment.py

single drug models

b42ac13

path issues

8b025b6

PascalIversen requested a review from Copilot May 27, 2025 14:05

Copilot AI reviewed May 27, 2025

View reviewed changes

PascalIversen marked this pull request as ready for review May 27, 2025 14:10

PascalIversen added 4 commits May 27, 2025 16:52

only for baselines

5bad8ce

some fixe

3064021

some fixe

88eebeb

single d model fix and path adaptions

286a42b

PascalIversen requested review from JudithBernett and removed request for JudithBernett May 28, 2025 14:55

PascalIversen added 2 commits May 29, 2025 12:04

fix

a3313f1

kleinster fix der kleinen fixes

6fdc886

PascalIversen requested a review from JudithBernett May 29, 2025 17:09

JudithBernett added 2 commits June 6, 2025 16:27

Updating to python 3.13

00dc7ee

updating documentation to python 3.13

7dc3622

JudithBernett approved these changes Jun 6, 2025

View reviewed changes

PascalIversen and others added 5 commits June 6, 2025 16:49

merged

fdc3082

floating point comparison

8928571

Merge branch 'final_model' of github.com:daisybio/drevalpy into final…

fb68252

…_model

trying hard to fix pre commit issue

0e466a2

trying hard to fix pre commit issue2

c5f4623

github-actions bot added the chore label Jun 10, 2025

PascalIversen and others added 19 commits June 10, 2025 14:41

prettier

4001394

Merge branch 'final_model' of github.com:daisybio/drevalpy into final…

ebfb5bf

…_model

deepseek lets go

f3e9540

show diffs of pre commit hooks

ec66382

Does the pre-commit version help?

b67aa58

Merge remote-tracking branch 'origin/final_model' into final_model

e6009e6

meaningless chagne

9eae648

merge

ab7431b

Merge branch 'development' into final_model

f5c13a6

Update workflow from development branch

3f6cdf5

merge

438e17d

restore pipeline

20ea648

verbose test

966966d

trying to fix the windows test on mac

f5dabe5

test verbose

d34ad21

Merge branch 'final_model' of github.com:daisybio/drevalpy into final…

78ad2b9

…_model

Removing the pre-commit version again

2f5a696

debugging online :o

913b786

the math was not mathing

7848a36

PascalIversen merged commit b7bde54 into development Jun 11, 2025
32 of 33 checks passed

JudithBernett deleted the final_model branch June 11, 2025 11:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optional final production model fitting procedure after CV#221

Optional final production model fitting procedure after CV#221
PascalIversen merged 57 commits intodevelopmentfrom
final_model

PascalIversen commented May 25, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI May 27, 2025

Uh oh!

JudithBernett left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

PascalIversen commented May 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI May 27, 2025

Choose a reason for hiding this comment

Uh oh!

JudithBernett left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

PascalIversen commented May 25, 2025 •

edited

Loading