You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
> Diagram illustrating the steps taken in this study. We first downloaded the cell-injury dataset from GitHub and the Image Data Resource (IDR) to obtain the raw morphological features and associated metadata. Using Pycytominer, we processed these features to generate feature-selected profiles, which were then split into training, testing, and holdout sets for model training. Finally, we applied our trained model to the JUMP dataset to predict cellular injuries for previously unseen compounds.
7
7
8
-
The objective of this project was to utilize[Pycytominer](https://github.com/cytomining/pycytominer)for generating feature-selected profiles from image-based data, aiming to train a multi-class logistic regression model for predicting cellular injury.
8
+
The goal of this project was to use[Pycytominer](https://github.com/cytomining/pycytominer)to generate feature-selected profiles from image-based data and train a multi-class logistic regression model to predict cellular injury.
9
9
10
-
We obtained the cell-injury dataset from [IDR](https://idr.openmicroscopy.org/webclient/?show=screen-3151) and its corresponding [GitHub repository](https://github.com/IDR/idr0133-dahlin-cellpainting).
11
-
Using [Pycytominer](https://github.com/cytomining/pycytominer), we processed these datasets to prepare them for subsequent model training.
12
-
We trained our model on the cell-injury dataset to predict 15 different types of injuries and our trained model to the JUMP dataset to predict cellular injuries.
10
+
We sourced the cell-injury dataset from the[IDR](https://idr.openmicroscopy.org/webclient/?show=screen-3151) and its corresponding [GitHub repository](https://github.com/IDR/idr0133-dahlin-cellpainting).
11
+
Using [Pycytominer](https://github.com/cytomining/pycytominer), we processed these datasets to prepare them for model training.
12
+
We then trained our model to predict 15 different types of injuries using the cell-injury dataset and applied the trained model to the JUMP dataset to identify cellular injuries.
13
13
14
14
## Data sources
15
15
16
-
We obtained the cell-injury dataset from [IDR](https://idr.openmicroscopy.org/webclient/?show=screen-3151) and its corresponding [GitHub repository](https://github.com/IDR/idr0133-dahlin-cellpainting).
17
-
Using Pycytominer, we processed these datasets to prepare them for subsequent model training.
18
-
We trained our model on the cell-injury dataset to predict 15 different types of injuries and our trained model to the JUMP dataset to predict cellular injuries.
16
+
We obtained the cell-injury dataset from the [IDR](https://idr.openmicroscopy.org/webclient/?show=screen-3151) and its associated [GitHub repository](https://github.com/IDR/idr0133-dahlin-cellpainting).
17
+
After processing these datasets with [Pycytominer](https://github.com/cytomining/pycytominer) to prepare them for model training, we trained a model to predict 15 different types of injuries using the cell-injury dataset.
18
+
We then applied this trained model to the JUMP dataset to predict cellular injuries.
19
+
19
20
| Data Source | Description |
20
21
|-------------|-------------|
21
22
|[IDR repository](https://github.com/IDR/idr0133-dahlin-cellpainting/tree/main/screenA)| Repository containing annotated screen data |
@@ -45,7 +46,7 @@ Below are all the notebook modules used in our study.
45
46
|[3.jump_analysis](./notebooks/3.jump_analysis/)| Applies our model to the JUMP dataset to predict cellular injuries |
46
47
|[4.visualizations](./notebooks/4.visualizations/)| Contains a notebook responsible for generating our figures |
47
48
48
-
## Installing respoitory and dependencies
49
+
## Installing repository and dependencies
49
50
50
51
This installation guide assumes that you have Conda installed.
51
52
If you do not have Conda installed, please follow the documentation [here](https://conda.io/projects/conda/en/latest/user-guide/install/index.html).
@@ -78,11 +79,10 @@ That's it! Your Conda environments should now be set up with the specified packa
78
79
79
80
## How to use the notebook modules
80
81
81
-
The notebooks are designed to be executed sequentially, with each module corresponding to a specific step in the process.
82
-
Each module includes a `.sh` script that automates the execution of the notebooks within each module.
82
+
The notebooks are meant to be run in sequence, with each module representing a distinct step in the process.
83
+
Each module also includes a `.sh` script to automate the execution of its corresponding notebooks.
83
84
84
-
All results generated by the notebooks are saved in the `./results` directory.
85
-
This directory is organized with subfolders that are numbered according to the module from which the results were produced.
85
+
All notebook results are stored in the `./results` directory, which is organized into subfolders numbered according to the module that generated the results.
86
86
87
87
For example, if you want to run the `1.data_splits` module (assuming that you have already completed the previous module `0.feature_selection_and_data/`), you can follow these steps:
88
88
@@ -102,15 +102,14 @@ For example, if you want to run the `1.data_splits` module (assuming that you ha
102
102
103
103
### Feature selection
104
104
105
-
Before conducting any feature selection processes, we first labeled wells associated with an injury.
106
-
We achieved this using the datasets downloaded from the [cell-injury](https://www.nature.com/articles/s41467-023-36829-x) study, which provided information on which treatments were associated with which injuries.
107
-
After mapping injury labels onto the wells based on their treatments, we applied feature alignment.
105
+
Before performing feature selection, we first labeled the wells associated with each injury.
106
+
We did this using datasets from the [cell-injury study](https://www.nature.com/articles/s41467-023-36829-x), which provided details on treatments linked to specific injuries.
107
+
After mapping the injury labels to the wells based on their treatments, we proceeded with feature alignment.
108
108
109
-
We identified which features in the `cell-injury` dataset were present in the JUMP dataset.
110
-
Once identified, we used only the "shared" features—morphological features common to both the JUMP and cell-injury datasets.
109
+
We identified the features shared between the `cell-injury` dataset and the JUMP dataset, focusing only on these "shared" morphological features.
111
110
112
-
Next, we applied feature selection using [Pycytominer](https://github.com/cytomining/pycytominer) to obtain informative features.
113
-
This process generated our `feature-selected` profiles, which will be used to train our multi-class logistic regression model.
111
+
We then applied feature selection using [Pycytominer](https://github.com/cytomining/pycytominer) to extract the most informative and least redundant features.
112
+
This process produced our `feature-selected` profiles, which were subsequently used to train the multi-class logistic regression model.
114
113
115
114
### Data splitting
116
115
@@ -170,10 +169,10 @@ Below is a list of the primary technologies and linters used:
170
169
- [**pycln**](https://github.com/hadialqattan/pycln): A tool to automatically remove unused imports from Python files, keeping the codebase clean and optimized.
171
170
- [**isort**](https://github.com/PyCQA/isort): An import sorting tool that organizes imports according to a specific style (in this case, aligned with Black's formatting rules). This helps maintain consistency in the order of imports throughout the codebase.
172
171
- [**ruff-pre-commit**](https://github.com/astral-sh/ruff-pre-commit): A fast Python linter and formatter that checks code style and can automatically fix formatting issues.
173
-
- [**blacken-docs**](https://github.com/adamchainz/blacken-docs): A utility that formats Python code within documentation blocks, ensuring that example code snippets indocstrings and markdown files adhere to the same standards as the main codebase.
172
+
- [**blacken-docs**](https://github.com/adamchainz/blacken-docs): A utility that formats Python code within documentation blocks, ensuring that example code snippets indocstring and markdown files adhere to the same standards as the main codebase.
174
173
- [**pre-commit-hooks**](https://github.com/pre-commit/pre-commit-hooks): A collection of various hooks, such as removing trailing whitespace, fixing end-of-line issues, and formatting JSON files.
175
174
176
-
**Note:** to see the pre-commit configurations please refere to the `./.pre-commit-config.yaml` file
175
+
**Note:** to see the pre-commit configurations please refer to the `./.pre-commit-config.yaml` file
0 commit comments