Skip to content

Leave-tissue-out cross-validation#192

Merged
PascalIversen merged 50 commits intodevelopmentfrom
lto
Apr 25, 2025
Merged

Leave-tissue-out cross-validation#192
PascalIversen merged 50 commits intodevelopmentfrom
lto

Conversation

@PascalIversen
Copy link
Collaborator

Datasets are required to have "tissue" column. Will break everything before we update zenodo, since our datasets don't have these columns yet 🗡️

Optional for custom datasets.

Will also break viz maybe.

@PascalIversen PascalIversen requested a review from Copilot April 17, 2025 17:16
@PascalIversen PascalIversen marked this pull request as draft April 17, 2025 17:17
@PascalIversen PascalIversen self-assigned this Apr 17, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for a "tissue" column and introduces leave-tissue-out (LTO) cross-validation.

  • Updated documentation and function signatures to include "LTO" as a valid test mode.
  • Added a TISSUE_IDENTIFIER constant in utils and passed tissue-related information through dataset loading and cross-validation methods.
  • Extended DrugResponseDataset to store tissue data and enforce its usage in LTO CV.

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
drevalpy/experiment.py Updated docstring for test_mode to include LTO.
drevalpy/datasets/utils.py Added constant TISSUE_IDENTIFIER to handle tissue column.
drevalpy/datasets/loader.py Updated function signatures and data loading to support tissue info.
drevalpy/datasets/dataset.py Extended DrugResponseDataset to store tissues and updated CV logic.

bug found by bot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@PascalIversen
Copy link
Collaborator Author

PascalIversen commented Apr 22, 2025

Other tissues should always be in test set?

Edit: all tissued mapped now :)

response=response_data[measure].values,
cell_line_ids=response_data[CELL_LINE_IDENTIFIER].values,
drug_ids=response_data[DRUG_IDENTIFIER].values,
tissues=response_data[TISSUE_IDENTIFIER].values,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gibt das hier nicht ein problem wenn response_data[tissue] nicht existiert?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ja das existiert jetzt aber bei unseren default datasets immer :)

@quirinmanz quirinmanz removed their assignment Apr 24, 2025
@quirinmanz
Copy link

Wishing you much success!

@JudithBernett JudithBernett marked this pull request as ready for review April 25, 2025 13:07
Copy link
Contributor

@JudithBernett JudithBernett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!!

@PascalIversen PascalIversen merged commit 8cd811d into development Apr 25, 2025
25 checks passed
@PascalIversen PascalIversen deleted the lto branch April 25, 2025 13:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants