Skip to content

Commit 27ca39b

Browse files
Merge pull request #238 from daisybio/development
v.1.3.4
2 parents 1d26871 + c4df4ac commit 27ca39b

17 files changed

+704
-586
lines changed

.github/pull_request_template.md

Lines changed: 23 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,29 @@
77
- [ ] This comment contains a description of changes (with reason)
88
- [ ] Referenced issue is linked
99
- [ ] If you've fixed a bug or added code that should be tested, add tests!
10-
- [ ] Documentation in `docs` is updated
10+
- [ ] Documentation in `docs` is updated. If you've created a new file, add it to the API documentation pages.
11+
12+
<!-- Only applies to PRs for a new version release, delete the lines that don't apply -->
13+
14+
**Version release checklist**
15+
16+
- [ ] Update the version in pyproject.toml
17+
- [ ] Update version/release in docs/conf.py
18+
- [ ] Run ‚poetry update‘ to get the latest package versions. This will update the poetry.lock file
19+
- [ ] Run ‚poetry export --without-hashes --without development -f requirements.txt -o requirements.txt‘ to update the requirements.txt file
20+
- [ ] (If one of the sphinx packages has been updated, you also need to update docs/requirements.txt)
21+
- [ ] (If poetry itself was updated, update that in the Dockerfile)
22+
- [ ] If you updated the python version:
23+
- [ ] Update the Dockerfile so that it always runs on the latest python version. Watch out: the ‚builder‘ is the full python, the ‚runtime‘ is a slim python build.
24+
- [ ] Update the python version in .github/workflows/: run_tests.yml, build_package.yml, publish_docs.yml, python-package.yml
25+
- [ ] Update the python version in noxfile.py
26+
- [ ] Update the documentation: contributing.rst, installation.rst
27+
28+
Then,
29+
30+
1. Open a PR from development to main with these changes.
31+
2. Wait for a review and merge.
32+
3. Create a new release on GitHub with the version number. Update the release notes with the changes made in this version.
1133

1234
**Description of changes**
1335

.github/workflows/build_package.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ jobs:
99
strategy:
1010
matrix:
1111
os: [macos-latest, ubuntu-latest, windows-latest]
12-
python: ["3.11", "3.12"]
12+
python: ["3.11", "3.12", "3.13"]
1313

1414
steps:
1515
- uses: actions/checkout@v4

.github/workflows/run_tests.yml

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -16,12 +16,12 @@ jobs:
1616
fail-fast: false
1717
matrix:
1818
include:
19-
- { python-version: "3.12", os: ubuntu-latest, session: "pre-commit" }
20-
- { python-version: "3.12", os: ubuntu-latest, session: "mypy" }
21-
- { python-version: "3.12", os: ubuntu-latest, session: "tests" }
22-
- { python-version: "3.12", os: windows-latest, session: "typeguard" }
23-
- { python-version: "3.12", os: ubuntu-latest, session: "xdoctest" }
24-
- { python-version: "3.12", os: ubuntu-latest, session: "docs-build" }
19+
- { python-version: "3.13", os: ubuntu-latest, session: "pre-commit" }
20+
- { python-version: "3.13", os: ubuntu-latest, session: "mypy" }
21+
- { python-version: "3.13", os: ubuntu-latest, session: "tests" }
22+
- { python-version: "3.13", os: windows-latest, session: "typeguard" }
23+
- { python-version: "3.13", os: ubuntu-latest, session: "xdoctest" }
24+
- { python-version: "3.13", os: ubuntu-latest, session: "docs-build" }
2525

2626
env:
2727
NOXSESSION: ${{ matrix.session }}

docs/conf.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -56,9 +56,9 @@
5656
# the built documents.
5757
#
5858
# The short X.Y version.
59-
version = "1.3.3"
59+
version = "1.3.4"
6060
# The full version, including alpha/beta/rc tags.
61-
release = "1.3.3"
61+
release = "1.3.4"
6262

6363
# The language for content autogenerated by Sphinx. Refer to documentation
6464
# for a list of supported languages.

docs/drevalpy.datasets.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ Loaders
2020
CurveCurator
2121
------------
2222

23-
.. automodule:: drevalpy.datasets.curvec
23+
.. automodule:: drevalpy.datasets.curvecurator
2424
:members:
2525
:undoc-members:
2626
:show-inheritance:

docs/drevalpy.models.baselines.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,14 @@ Sklearn Models
1717
:undoc-members:
1818
:show-inheritance:
1919

20+
Single-Drug Elastic Net
21+
-----------------------------------------------------------
22+
23+
.. automodule:: drevalpy.models.baselines.singledrug_elastic_net
24+
:members:
25+
:undoc-members:
26+
:show-inheritance:
27+
2028
Single-Drug Random Forest
2129
-----------------------------------------------------------
2230

docs/drevalpy.visualization.rst

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -9,10 +9,10 @@ Outplot
99
:undoc-members:
1010
:show-inheritance:
1111

12-
Correlation comparison scatter plot
12+
Comparison scatter plot
1313
-------------------------------------------------
1414

15-
.. automodule:: drevalpy.visualization.corr_comp_scatter
15+
.. automodule:: drevalpy.visualization.comp_scatter
1616
:members:
1717
:undoc-members:
1818
:show-inheritance:
@@ -25,10 +25,10 @@ Critical difference plot
2525
:undoc-members:
2626
:show-inheritance:
2727

28-
HTML tables
28+
Cross study tables
2929
------------------------------------------
3030

31-
.. automodule:: drevalpy.visualization.html_tables
31+
.. automodule:: drevalpy.visualization.cross_study_tables
3232
:members:
3333
:undoc-members:
3434
:show-inheritance:

drevalpy/datasets/curvecurator.py

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -123,13 +123,22 @@ def _exec_curvecurator(output_dir: Path, batched: bool = True):
123123
of configs spefified in <output_dir>/configlist.txt and consecutively executing each
124124
CurveCurator run. If False, run a single CurveCurator run (this can be used for
125125
parallelisation).
126+
:raises RuntimeError: If CurveCurator fails to execute, the error message is printed to stdout and stderr.
126127
"""
127128
if batched:
128129
command = ["CurveCurator", str(output_dir / "configlist.txt"), "--mad", "--batch"]
129130
else:
130131
command = ["CurveCurator", str(output_dir / "config.toml"), "--mad"]
131-
process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
132-
process.communicate()
132+
process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True)
133+
stdout, stderr = process.communicate()
134+
135+
if process.returncode != 0:
136+
print("CurveCurator stdout:")
137+
print(stdout)
138+
print("CurveCurator stderr:")
139+
print(stderr)
140+
141+
raise RuntimeError(f"CurveCurator failed with exit code {process.returncode}")
133142

134143

135144
def _calc_ic50(model_params_df: pd.DataFrame):

drevalpy/datasets/dataset.py

Lines changed: 25 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -817,6 +817,7 @@ def from_csv(
817817
view_name: str,
818818
drop_columns: list[str] | None = None,
819819
transpose: bool = False,
820+
extract_meta_info: bool = True,
820821
):
821822
"""Load a one-view feature dataset from a csv file.
822823
@@ -830,44 +831,55 @@ def from_csv(
830831
:param id_column: name of the column containing the identifiers
831832
:param drop_columns: list of columns to drop (e.g. other identifier columns)
832833
:param transpose: if True, the csv is transposed, i.e. the rows become columns and vice versa
834+
:param extract_meta_info: if True, extracts meta information from the dataset, e.g. gene names for gene expression
833835
:returns: FeatureDataset object containing data from provided csv file.
834836
"""
835837
data = pd.read_csv(path_to_csv).T if transpose else pd.read_csv(path_to_csv)
836838
data[id_column] = data[id_column].astype(str)
837839
ids = data[id_column].values
838840
data_features = data.drop(columns=(drop_columns or []))
839841
data_features = data_features.set_index(id_column)
840-
# remove duplicate feature rows (rows with the same index)
841842
data_features = data_features[~data_features.index.duplicated(keep="first")]
842843
features = {}
843844

844845
for identifier in ids:
845846
features_for_instance = data_features.loc[identifier].values
846847
features[identifier] = {view_name: features_for_instance}
847848

848-
return cls(features=features)
849+
meta_info = {}
850+
if extract_meta_info:
851+
meta_info = {view_name: list(data_features.columns)}
852+
853+
return cls(features=features, meta_info=meta_info)
849854

850855
def to_csv(self, path: str | Path, id_column: str, view_name: str):
851856
"""
852-
Save the feature dataset to a CSV file.
857+
Save the feature dataset to a CSV file. If meta_info is available for the view and valid,
858+
it will be written as column names.
853859
854860
:param path: Path to the CSV file.
855861
:param id_column: Name of the column containing the identifiers.
856-
:param view_name: Name of the view (e.g., gene_expression).
857-
858-
:raises ValueError: If the view is not found for an identifier.
862+
:param view_name: Name of the view.
859863
"""
860864
data = []
865+
feature_names = None
866+
861867
for identifier, feature_dict in self.features.items():
862-
# Get the feature vector for the specified view
863-
if view_name in feature_dict:
864-
row = {id_column: identifier}
865-
row.update({f"feature_{i}": value for i, value in enumerate(feature_dict[view_name])})
866-
data.append(row)
867-
else:
868+
vector = feature_dict.get(view_name)
869+
if vector is None:
868870
raise ValueError(f"View {view_name!r} not found for identifier {identifier!r}.")
869871

870-
# Convert to DataFrame and save to CSV
872+
if feature_names is None:
873+
meta_names = self.meta_info.get(view_name)
874+
if isinstance(meta_names, list) and len(meta_names) == len(vector):
875+
feature_names = meta_names
876+
else:
877+
feature_names = [f"feature_{i}" for i in range(len(vector))]
878+
879+
row = {id_column: identifier}
880+
row.update({name: value for name, value in zip(feature_names, vector)})
881+
data.append(row)
882+
871883
df = pd.DataFrame(data)
872884
df.to_csv(path, index=False)
873885

drevalpy/experiment.py

Lines changed: 22 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -302,7 +302,7 @@ def drug_response_experiment(
302302
path_data=path_data,
303303
model_checkpoint_dir=model_checkpoint_dir,
304304
metric=metric,
305-
result_path=final_model_path,
305+
final_model_path=final_model_path,
306306
test_mode=test_mode,
307307
val_ratio=0.1,
308308
hyperparameter_tuning=hyperparameter_tuning,
@@ -585,7 +585,7 @@ def cross_study_prediction(
585585
drug_input=drug_input,
586586
)
587587
if response_transformation:
588-
dataset._response = response_transformation.inverse_transform(dataset.response)
588+
dataset.inverse_transform(response_transformation)
589589
else:
590590
dataset._predictions = np.array([])
591591
dataset.to_csv(
@@ -993,18 +993,23 @@ def train_and_predict(
993993
)
994994

995995
if len(prediction_dataset) > 0:
996+
drug_input = drug_features.copy() if drug_features is not None else None
996997
prediction_dataset._predictions = model.predict(
997998
cell_line_ids=prediction_dataset.cell_line_ids,
998999
drug_ids=prediction_dataset.drug_ids,
9991000
cell_line_input=cl_features.copy(),
10001001
drug_input=drug_input,
10011002
)
10021003

1003-
if response_transformation:
1004-
prediction_dataset.inverse_transform(response_transformation)
10051004
else:
10061005
prediction_dataset._predictions = np.array([])
10071006

1007+
if response_transformation:
1008+
train_dataset.inverse_transform(response_transformation)
1009+
prediction_dataset.inverse_transform(response_transformation)
1010+
if early_stopping_dataset is not None:
1011+
early_stopping_dataset.inverse_transform(response_transformation)
1012+
10081013
return prediction_dataset
10091014

10101015

@@ -1016,7 +1021,7 @@ def train_and_evaluate(
10161021
validation_dataset: DrugResponseDataset,
10171022
early_stopping_dataset: DrugResponseDataset | None = None,
10181023
response_transformation: TransformerMixin | None = None,
1019-
metric: str = "rmse",
1024+
metric: str = "RMSE",
10201025
model_checkpoint_dir: str = "TEMPORARY",
10211026
) -> dict[str, float]:
10221027
"""
@@ -1283,15 +1288,14 @@ def generate_data_saving_path(model_name, drug_id, result_path, suffix) -> str:
12831288
return model_path
12841289

12851290

1286-
@pipeline_function
12871291
def train_final_model(
12881292
model_class: type[DRPModel],
12891293
full_dataset: DrugResponseDataset,
12901294
response_transformation: TransformerMixin,
12911295
path_data: str,
12921296
model_checkpoint_dir: str,
12931297
metric: str,
1294-
result_path: str,
1298+
final_model_path: str,
12951299
test_mode: str = "LCO",
12961300
val_ratio: float = 0.1,
12971301
hyperparameter_tuning: bool = True,
@@ -1314,7 +1318,7 @@ def train_final_model(
13141318
:param path_data: path to data directory
13151319
:param model_checkpoint_dir: checkpoint dir for intermediate tuning models
13161320
:param metric: metric for tuning, e.g., "RMSE"
1317-
:param result_path: path to results
1321+
:param final_model_path: path to final_model save directory
13181322
:param test_mode: split logic for validation (LCO, LDO, LTO, LPO)
13191323
:param val_ratio: validation size ratio
13201324
:param hyperparameter_tuning: whether to perform hyperparameter tuning
@@ -1356,17 +1360,25 @@ def train_final_model(
13561360
print(f"Best hyperparameters for final model: {best_hpams}")
13571361
train_dataset.add_rows(validation_dataset)
13581362
train_dataset.shuffle(random_state=42)
1363+
if response_transformation:
1364+
train_dataset.fit_transform(response_transformation)
1365+
if early_stopping_dataset is not None:
1366+
early_stopping_dataset.transform(response_transformation)
13591367

13601368
model.build_model(hyperparameters=best_hpams)
1369+
drug_features = drug_features.copy() if drug_features is not None else None
13611370
model.train(
13621371
output=train_dataset,
13631372
output_earlystopping=early_stopping_dataset,
1364-
cell_line_input=cl_features,
1373+
cell_line_input=cl_features.copy(),
13651374
drug_input=drug_features,
13661375
model_checkpoint_dir=model_checkpoint_dir,
13671376
)
1377+
if response_transformation:
1378+
train_dataset.inverse_transform(response_transformation)
1379+
if early_stopping_dataset is not None:
1380+
early_stopping_dataset.inverse_transform(response_transformation)
13681381

1369-
final_model_path = os.path.join(result_path, "final_model")
13701382
os.makedirs(final_model_path, exist_ok=True)
13711383
model.save(final_model_path)
13721384

0 commit comments

Comments
 (0)