Skip to content

Commit 8cfc8e1

Browse files
authored
Updating Readme: Fixing typos and clarity (#29)
1 parent 48d5429 commit 8cfc8e1

File tree

1 file changed

+22
-23
lines changed

1 file changed

+22
-23
lines changed

README.md

Lines changed: 22 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,22 @@
1-
# Predicting cellular injury using Pyctominer
1+
# Predicting cellular injury
22

33
[![DOI](https://zenodo.org/badge/744169074.svg)](https://zenodo.org/doi/10.5281/zenodo.12514972)
44

55
![workflow](./notebooks/4.visualization/figures/workflow_fig.png)
6-
> Diagram protraying taken to conduct this study.
6+
> Diagram illustrating the steps taken in this study. We first downloaded the cell-injury dataset from GitHub and the Image Data Resource (IDR) to obtain the raw morphological features and associated metadata. Using Pycytominer, we processed these features to generate feature-selected profiles, which were then split into training, testing, and holdout sets for model training. Finally, we applied our trained model to the JUMP dataset to predict cellular injuries for previously unseen compounds.
77
8-
The objective of this project was to utilize [Pycytominer](https://github.com/cytomining/pycytominer) for generating feature-selected profiles from image-based data, aiming to train a multi-class logistic regression model for predicting cellular injury.
8+
The goal of this project was to use [Pycytominer](https://github.com/cytomining/pycytominer) to generate feature-selected profiles from image-based data and train a multi-class logistic regression model to predict cellular injury.
99

10-
We obtained the cell-injury dataset from [IDR](https://idr.openmicroscopy.org/webclient/?show=screen-3151) and its corresponding [GitHub repository](https://github.com/IDR/idr0133-dahlin-cellpainting).
11-
Using [Pycytominer](https://github.com/cytomining/pycytominer), we processed these datasets to prepare them for subsequent model training.
12-
We trained our model on the cell-injury dataset to predict 15 different types of injuries and our trained model to the JUMP dataset to predict cellular injuries.
10+
We sourced the cell-injury dataset from the [IDR](https://idr.openmicroscopy.org/webclient/?show=screen-3151) and its corresponding [GitHub repository](https://github.com/IDR/idr0133-dahlin-cellpainting).
11+
Using [Pycytominer](https://github.com/cytomining/pycytominer), we processed these datasets to prepare them for model training.
12+
We then trained our model to predict 15 different types of injuries using the cell-injury dataset and applied the trained model to the JUMP dataset to identify cellular injuries.
1313

1414
## Data sources
1515

16-
We obtained the cell-injury dataset from [IDR](https://idr.openmicroscopy.org/webclient/?show=screen-3151) and its corresponding [GitHub repository](https://github.com/IDR/idr0133-dahlin-cellpainting).
17-
Using Pycytominer, we processed these datasets to prepare them for subsequent model training.
18-
We trained our model on the cell-injury dataset to predict 15 different types of injuries and our trained model to the JUMP dataset to predict cellular injuries.
16+
We obtained the cell-injury dataset from the [IDR](https://idr.openmicroscopy.org/webclient/?show=screen-3151) and its associated [GitHub repository](https://github.com/IDR/idr0133-dahlin-cellpainting).
17+
After processing these datasets with [Pycytominer](https://github.com/cytomining/pycytominer) to prepare them for model training, we trained a model to predict 15 different types of injuries using the cell-injury dataset.
18+
We then applied this trained model to the JUMP dataset to predict cellular injuries.
19+
1920
| Data Source | Description |
2021
|-------------|-------------|
2122
| [IDR repository](https://github.com/IDR/idr0133-dahlin-cellpainting/tree/main/screenA) | Repository containing annotated screen data |
@@ -45,7 +46,7 @@ Below are all the notebook modules used in our study.
4546
| [3.jump_analysis](./notebooks/3.jump_analysis/) | Applies our model to the JUMP dataset to predict cellular injuries |
4647
| [4.visualizations](./notebooks/4.visualizations/) | Contains a notebook responsible for generating our figures |
4748

48-
## Installing respoitory and dependencies
49+
## Installing repository and dependencies
4950

5051
This installation guide assumes that you have Conda installed.
5152
If you do not have Conda installed, please follow the documentation [here](https://conda.io/projects/conda/en/latest/user-guide/install/index.html).
@@ -78,11 +79,10 @@ That's it! Your Conda environments should now be set up with the specified packa
7879
7980
## How to use the notebook modules
8081
81-
The notebooks are designed to be executed sequentially, with each module corresponding to a specific step in the process.
82-
Each module includes a `.sh` script that automates the execution of the notebooks within each module.
82+
The notebooks are meant to be run in sequence, with each module representing a distinct step in the process.
83+
Each module also includes a `.sh` script to automate the execution of its corresponding notebooks.
8384
84-
All results generated by the notebooks are saved in the `./results` directory.
85-
This directory is organized with subfolders that are numbered according to the module from which the results were produced.
85+
All notebook results are stored in the `./results` directory, which is organized into subfolders numbered according to the module that generated the results.
8686
8787
For example, if you want to run the `1.data_splits` module (assuming that you have already completed the previous module `0.feature_selection_and_data/`), you can follow these steps:
8888
@@ -102,15 +102,14 @@ For example, if you want to run the `1.data_splits` module (assuming that you ha
102102
103103
### Feature selection
104104
105-
Before conducting any feature selection processes, we first labeled wells associated with an injury.
106-
We achieved this using the datasets downloaded from the [cell-injury](https://www.nature.com/articles/s41467-023-36829-x) study, which provided information on which treatments were associated with which injuries.
107-
After mapping injury labels onto the wells based on their treatments, we applied feature alignment.
105+
Before performing feature selection, we first labeled the wells associated with each injury.
106+
We did this using datasets from the [cell-injury study](https://www.nature.com/articles/s41467-023-36829-x), which provided details on treatments linked to specific injuries.
107+
After mapping the injury labels to the wells based on their treatments, we proceeded with feature alignment.
108108
109-
We identified which features in the `cell-injury` dataset were present in the JUMP dataset.
110-
Once identified, we used only the "shared" features—morphological features common to both the JUMP and cell-injury datasets.
109+
We identified the features shared between the `cell-injury` dataset and the JUMP dataset, focusing only on these "shared" morphological features.
111110
112-
Next, we applied feature selection using [Pycytominer](https://github.com/cytomining/pycytominer) to obtain informative features.
113-
This process generated our `feature-selected` profiles, which will be used to train our multi-class logistic regression model.
111+
We then applied feature selection using [Pycytominer](https://github.com/cytomining/pycytominer) to extract the most informative and least redundant features.
112+
This process produced our `feature-selected` profiles, which were subsequently used to train the multi-class logistic regression model.
114113
115114
### Data splitting
116115
@@ -170,10 +169,10 @@ Below is a list of the primary technologies and linters used:
170169
- [**pycln**](https://github.com/hadialqattan/pycln): A tool to automatically remove unused imports from Python files, keeping the codebase clean and optimized.
171170
- [**isort**](https://github.com/PyCQA/isort): An import sorting tool that organizes imports according to a specific style (in this case, aligned with Black's formatting rules). This helps maintain consistency in the order of imports throughout the codebase.
172171
- [**ruff-pre-commit**](https://github.com/astral-sh/ruff-pre-commit): A fast Python linter and formatter that checks code style and can automatically fix formatting issues.
173-
- [**blacken-docs**](https://github.com/adamchainz/blacken-docs): A utility that formats Python code within documentation blocks, ensuring that example code snippets in docstrings and markdown files adhere to the same standards as the main codebase.
172+
- [**blacken-docs**](https://github.com/adamchainz/blacken-docs): A utility that formats Python code within documentation blocks, ensuring that example code snippets in docstring and markdown files adhere to the same standards as the main codebase.
174173
- [**pre-commit-hooks**](https://github.com/pre-commit/pre-commit-hooks): A collection of various hooks, such as removing trailing whitespace, fixing end-of-line issues, and formatting JSON files.
175174

176-
**Note:** to see the pre-commit configurations please refere to the `./.pre-commit-config.yaml` file
175+
**Note:** to see the pre-commit configurations please refer to the `./.pre-commit-config.yaml` file
177176

178177
For machine learning, we used:
179178

0 commit comments

Comments
 (0)