-
Notifications
You must be signed in to change notification settings - Fork 3
Feature/curvecurator #84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 10 commits
Commits
Show all changes
17 commits
Select commit
Hold shift + click to select a range
1e09922
added CurveCurator logic
picciama 1f01a14
Merge branch 'development' into feature/curvecurator
picciama a813b66
annotated pipeline defs, sep curvecurator logic
picciama 6b9c28c
fixed curvecuration, and added unittest
picciama b49c6df
fixed mypy
picciama 4c8ad80
updated versions to 3.11
picciama 2e8c796
updated version to 3.11
picciama 6e29e73
cleaned up imports
picciama bbc6026
fixed malformed sphinx config
picciama c0ae3d6
fixed typeguard issues with load_dataset call
picciama 4bd219d
docstring for curvecurator, calc ic50, measure arg
picciama 8c65e2a
added curve_curator option to load_dataset
picciama 56ead0d
added cores/measures args, path_data simplified
picciama c91a747
fix typo
picciama 47c08b9
fix missing args in test_suite
picciama 6b200c3
explicitly setting measure for toy_data
picciama 6c84254
fix missing string conversion
picciama File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,5 +1,4 @@ | ||
| sphinx-autobuild==2024.10.3 ; python_version >= "3.9" and python_full_version <= "3.13.0" | ||
| sphinx-autodoc-typehints==2.3.0 ; python_version >= "3.9" and python_full_version <= "3.13.0" | ||
| sphinx-click==6.0.0 ; python_version >= "3.9" and python_full_version <= "3.13.0" | ||
| sphinx-rtd-theme==3.0.2 ; python_version >= "3.9" and python_full_version <= "3.13.0" | ||
| -e . | ||
| sphinx-autobuild==2024.10.3 ; python_version >= "3.11" and python_version < "3.13" | ||
| sphinx-autodoc-typehints==2.5.0 ; python_version >= "3.11" and python_version < "3.13" | ||
| sphinx-click==6.0.0 ; python_version >= "3.11" and python_version < "3.13" | ||
| sphinx-rtd-theme==3.0.2 ; python_version >= "3.11" and python_version < "3.13" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,186 @@ | ||
| """Contains all function required for CurveCurator fitting.""" | ||
|
|
||
| import subprocess | ||
| from pathlib import Path | ||
|
|
||
| import numpy as np | ||
| import pandas as pd | ||
| import toml | ||
|
|
||
| from ..pipeline_function import pipeline_function | ||
|
|
||
|
|
||
| def _prepare_raw_data(curve_df: pd.DataFrame, output_dir: str | Path): | ||
| required_columns = ["dose", "response", "sample", "drug"] | ||
| if not all([col in curve_df.columns for col in required_columns]): | ||
| raise ValueError("Missing columns in viability data. Required columns are {required_columns}.") | ||
| if "replicate" in curve_df.columns: | ||
| required_columns.append("replicate") | ||
| curve_df = curve_df[required_columns] | ||
| n_replicates = 1 | ||
| conc_columns = ["dose"] | ||
| has_multicol_index = False | ||
| if "replicate" in curve_df.columns: | ||
| n_replicates = curve_df["replicate"].nunique() | ||
| conc_columns.append("replicate") | ||
| has_multicol_index = True | ||
|
|
||
| df = curve_df.pivot(index=["sample", "drug"], columns=conc_columns, values="response") | ||
|
|
||
| for i in range(n_replicates): | ||
| df.insert(0, (0.0, n_replicates - i), 1.0) | ||
|
|
||
| concentrations = df.columns.sort_values() | ||
| df = df[concentrations] | ||
|
|
||
| experiments = np.arange(df.shape[1]) | ||
| df.insert(0, "Name", df.index.map(lambda x: f"{x[0]}|{x[1]}")) | ||
| df.columns = ["Name"] + [f"Raw {i}" for i in experiments] | ||
|
|
||
| curvecurator_folder = Path(output_dir) | ||
| curvecurator_folder.mkdir(exist_ok=True, parents=True) | ||
| df.to_csv(curvecurator_folder / "curvecurator_input.tsv", sep="\t", index=False) | ||
|
|
||
| if has_multicol_index: | ||
| doses = [pair[0] for pair in concentrations] | ||
| else: | ||
| doses = concentrations.to_list() | ||
| return len(experiments), doses, n_replicates, len(df) | ||
|
|
||
|
|
||
| def _prepare_toml(filename: str, n_exp: int, n_replicates: int, doses: list[float], dataset_name: str, cores: int): | ||
| config = { | ||
| "Meta": { | ||
| "id": filename, | ||
| "description": dataset_name, | ||
| "condition": "drug", | ||
| "treatment_time": "72 h", | ||
| }, | ||
| "Experiment": { | ||
| "experiments": range(n_exp), | ||
| "doses": doses, | ||
| "dose_scale": "1e-06", | ||
| "dose_unit": "M", | ||
| "control_experiment": [i for i in range(n_replicates)], | ||
| "measurement_type": "OTHER", | ||
| "data_type": "OTHER", | ||
| "search_engine": "OTHER", | ||
| "search_engine_version": "0", | ||
| }, | ||
| "Paths": { | ||
| "input_file": "curvecurator_input.tsv", | ||
| "curves_file": "curves.txt", | ||
| "normalization_file": "norm.txt", | ||
| "mad_file": "mad.txt", | ||
| "dashboard": "dashboard.html", | ||
| }, | ||
| "Processing": { | ||
| "available_cores": cores, | ||
| "max_missing": max(len(doses) - 5, 0), | ||
| "imputation": False, | ||
| "normalization": False, | ||
| }, | ||
| "Curve Fit": { | ||
| "type": "OLS", | ||
| "speed": "exhaustive", | ||
| "max_iterations": 1000, | ||
| "interpolation": False, | ||
| "control_fold_change": True, | ||
| }, | ||
| "F Statistic": { | ||
| "optimized_dofs": True, | ||
| "alpha": 0.05, | ||
| "fc_lim": 0.45, | ||
| }, | ||
| } | ||
| return config | ||
|
|
||
|
|
||
| def _exec_curvecurator(output_dir: Path): | ||
| command = ["CurveCurator", str(output_dir / "config.toml"), "--mad"] | ||
| process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE) | ||
| process.communicate() | ||
|
|
||
|
|
||
| @pipeline_function | ||
| def preprocess(input_file: str | Path, output_dir: str | Path, dataset_name: str, cores: int): | ||
| """ | ||
| Preprocess raw viability data and create config.toml for use with CurveCurator. | ||
|
|
||
| :param input_file: Path to csv file containing the raw viability data | ||
| :param output_dir: Path to store all the files to, including the preprocessed data, the config.toml | ||
| for CurveCurator, CurveCurator's output files, and the postprocessed data | ||
| :param dataset_name: Name of the dataset | ||
| :param cores: The number of cores to be used for fitting the curves using CurveCurator. | ||
| This parameter is written into the config.toml, but it is min of the number of curves to fit | ||
| and the number given (min(n_curves, cores)) | ||
| """ | ||
| input_file = Path(input_file) | ||
| output_dir = Path(output_dir) | ||
| curve_df = pd.read_csv(input_file) | ||
|
|
||
| n_exp, doses, n_replicates, n_curves_to_fit = _prepare_raw_data(curve_df, output_dir) | ||
| cores = min(n_curves_to_fit, cores) | ||
|
|
||
| config = _prepare_toml(input_file.name, n_exp, n_replicates, doses, dataset_name, cores) | ||
| with open(output_dir / "config.toml", "w") as f: | ||
| toml.dump(config, f) | ||
|
|
||
|
|
||
| @pipeline_function | ||
| def postprocess(output_folder: str | Path, dataset_name: str): | ||
| """ | ||
| Postprocess CurveCurator output file. | ||
|
|
||
| This function reads the curves.txt file created by CurveCurator, which contains the | ||
| fitted curve parameters and postprocesses it to be used by drevalpy. | ||
|
|
||
| :param output_folder: Path to the output folder of CurveCurator containin the curves.txt file. | ||
| :param dataset_name: The name of the dataset, will be used to prepend the postprocessed <dataset_name>.csv file | ||
| """ | ||
| output_folder = Path(output_folder) | ||
| required_columns = { | ||
| "Name": "Name", | ||
| "pEC50": "response", | ||
| "pEC50 Error": "pEC50Error", | ||
| "Curve Slope": "Slope", | ||
| "Curve Front": "Front", | ||
| "Curve Back": "Back", | ||
| "Curve Fold Change": "FoldChange", | ||
| "Curve AUC": "AUC", | ||
| "Curve R2": "R2", | ||
| "Curve P_Value": "pValue", | ||
| "Curve Relevance Score": "RelevanceScore", | ||
| "Curve F_Value": "fValue", | ||
| "Curve Log P_Value": "negLog10pValue", | ||
| "Signal Quality": "SignalQuality", | ||
| "Curve RMSE": "RMSE", | ||
| "Curve F_Value SAM Corrected": "fValueSAMCorrected", | ||
| "Curve Regulation": "Regulation", | ||
| } | ||
| fitted_curve_data = pd.read_csv(Path(output_folder) / "curves.txt", sep="\t", usecols=required_columns).rename( | ||
| columns=required_columns | ||
| ) | ||
| fitted_curve_data[["cell_line_id", "drug_id"]] = fitted_curve_data.Name.str.split("|", expand=True) | ||
| fitted_curve_data.to_csv(output_folder / f"{dataset_name}.csv", index=None) | ||
|
|
||
|
|
||
| def fit_curves(input_file: str | Path, output_dir: str | Path, dataset_name: str, cores: int): | ||
| """ | ||
| Fit curves for provided raw viability data. | ||
|
|
||
| This functions reads viability data in a predefined input format, preprocesses the data | ||
| to be readable by CurveCurator, fits curves to the data using CurveCurator, and postprocesses | ||
| the fitted data to a format required by drevalpy. | ||
|
|
||
| :param input_file: Path to the file containing the raw viability data | ||
| :param output_dir: Path to store all the files to, including the preprocessed data, the config.toml | ||
| for CurveCurator, CurveCurator's output files, and the postprocessed data | ||
| :param dataset_name: The name of the dataset, will be used to prepend the postprocessed <dataset_name>.csv file | ||
| :param cores: The number of cores to be used for fitting the curves using CurveCurator. | ||
| This parameter is written into the config.toml, but it is min of the number of curves to fit | ||
| and the number given (min(n_curves, cores)) | ||
| """ | ||
| preprocess(input_file, output_dir, dataset_name, cores) | ||
| _exec_curvecurator(Path(output_dir)) | ||
| postprocess(output_dir, dataset_name) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.