Updating Readme: Fixing typos and clarity (#29)

axiomcura · web-flow · commit 8cfc8e18414a · 2024-09-26T16:02:00.000-06:00
diff --git a/README.md b/README.md
@@ -1,21 +1,22 @@
-# Predicting cellular injury using Pyctominer
+# Predicting cellular injury
 
 [![DOI](https://zenodo.org/badge/744169074.svg)](https://zenodo.org/doi/10.5281/zenodo.12514972)
 
 ![workflow](./notebooks/4.visualization/figures/workflow_fig.png)
-> Diagram protraying taken to conduct this study.
+> Diagram illustrating the steps taken in this study. We first downloaded the cell-injury dataset from GitHub and the Image Data Resource (IDR) to obtain the raw morphological features and associated metadata. Using Pycytominer, we processed these features to generate feature-selected profiles, which were then split into training, testing, and holdout sets for model training. Finally, we applied our trained model to the JUMP dataset to predict cellular injuries for previously unseen compounds.
 
-The objective of this project was to utilize [Pycytominer](https://github.com/cytomining/pycytominer) for generating feature-selected profiles from image-based data, aiming to train a multi-class logistic regression model for predicting cellular injury.
+The goal of this project was to use [Pycytominer](https://github.com/cytomining/pycytominer) to generate feature-selected profiles from image-based data and train a multi-class logistic regression model to predict cellular injury.
 
-We obtained the cell-injury dataset from [IDR](https://idr.openmicroscopy.org/webclient/?show=screen-3151) and its corresponding [GitHub repository](https://github.com/IDR/idr0133-dahlin-cellpainting).
-Using [Pycytominer](https://github.com/cytomining/pycytominer), we processed these datasets to prepare them for subsequent model training.
-We trained our model on the cell-injury dataset to predict 15 different types of injuries and our trained model to the JUMP dataset to predict cellular injuries.
+We sourced the cell-injury dataset from the [IDR](https://idr.openmicroscopy.org/webclient/?show=screen-3151) and its corresponding [GitHub repository](https://github.com/IDR/idr0133-dahlin-cellpainting).
+Using [Pycytominer](https://github.com/cytomining/pycytominer), we processed these datasets to prepare them for model training.
+We then trained our model to predict 15 different types of injuries using the cell-injury dataset and applied the trained model to the JUMP dataset to identify cellular injuries.
 
 ## Data sources
 
-We obtained the cell-injury dataset from [IDR](https://idr.openmicroscopy.org/webclient/?show=screen-3151) and its corresponding [GitHub repository](https://github.com/IDR/idr0133-dahlin-cellpainting).
-Using Pycytominer, we processed these datasets to prepare them for subsequent model training.
-We trained our model on the cell-injury dataset to predict 15 different types of injuries and our trained model to the JUMP dataset to predict cellular injuries.
+We obtained the cell-injury dataset from the [IDR](https://idr.openmicroscopy.org/webclient/?show=screen-3151) and its associated [GitHub repository](https://github.com/IDR/idr0133-dahlin-cellpainting).
+After processing these datasets with [Pycytominer](https://github.com/cytomining/pycytominer) to prepare them for model training, we trained a model to predict 15 different types of injuries using the cell-injury dataset.
+We then applied this trained model to the JUMP dataset to predict cellular injuries.
+
 | Data Source | Description |
 |-------------|-------------|
 | [IDR repository](https://github.com/IDR/idr0133-dahlin-cellpainting/tree/main/screenA) | Repository containing annotated screen data |
@@ -45,7 +46,7 @@ Below are all the notebook modules used in our study.
 | [3.jump_analysis](./notebooks/3.jump_analysis/) | Applies our model to the JUMP dataset to predict cellular injuries |
 | [4.visualizations](./notebooks/4.visualizations/) | Contains a notebook responsible for generating our figures |
 
-## Installing respoitory and dependencies
+## Installing repository and dependencies
 
 This installation guide assumes that you have Conda installed.
 If you do not have Conda installed, please follow the documentation [here](https://conda.io/projects/conda/en/latest/user-guide/install/index.html).
@@ -78,11 +79,10 @@ That's it! Your Conda environments should now be set up with the specified packa
 
 ## How to use the notebook modules
 
-The notebooks are designed to be executed sequentially, with each module corresponding to a specific step in the process.
-Each module includes a `.sh` script that automates the execution of the notebooks within each module.
+The notebooks are meant to be run in sequence, with each module representing a distinct step in the process.
+Each module also includes a `.sh` script to automate the execution of its corresponding notebooks.
 
-All results generated by the notebooks are saved in the `./results` directory.
-This directory is organized with subfolders that are numbered according to the module from which the results were produced.
+All notebook results are stored in the `./results` directory, which is organized into subfolders numbered according to the module that generated the results.
 
 For example, if you want to run the `1.data_splits` module (assuming that you have already completed the previous module `0.feature_selection_and_data/`), you can follow these steps:
 
@@ -102,15 +102,14 @@ For example, if you want to run the `1.data_splits` module (assuming that you ha
 
 ### Feature selection
 
-Before conducting any feature selection processes, we first labeled wells associated with an injury.
-We achieved this using the datasets downloaded from the [cell-injury](https://www.nature.com/articles/s41467-023-36829-x) study, which provided information on which treatments were associated with which injuries.
-After mapping injury labels onto the wells based on their treatments, we applied feature alignment.
+Before performing feature selection, we first labeled the wells associated with each injury.
+We did this using datasets from the [cell-injury study](https://www.nature.com/articles/s41467-023-36829-x), which provided details on treatments linked to specific injuries.
+After mapping the injury labels to the wells based on their treatments, we proceeded with feature alignment.
 
-We identified which features in the `cell-injury` dataset were present in the JUMP dataset.
-Once identified, we used only the "shared" features—morphological features common to both the JUMP and cell-injury datasets.
+We identified the features shared between the `cell-injury` dataset and the JUMP dataset, focusing only on these "shared" morphological features.
 
-Next, we applied feature selection using [Pycytominer](https://github.com/cytomining/pycytominer) to obtain informative features.
-This process generated our `feature-selected` profiles, which will be used to train our multi-class logistic regression model.
+We then applied feature selection using [Pycytominer](https://github.com/cytomining/pycytominer) to extract the most informative and least redundant features.
+This process produced our `feature-selected` profiles, which were subsequently used to train the multi-class logistic regression model.
 
 ### Data splitting
 
@@ -170,10 +169,10 @@ Below is a list of the primary technologies and linters used:
 - [**pycln**](https://github.com/hadialqattan/pycln): A tool to automatically remove unused imports from Python files, keeping the codebase clean and optimized.
 - [**isort**](https://github.com/PyCQA/isort): An import sorting tool that organizes imports according to a specific style (in this case, aligned with Black's formatting rules). This helps maintain consistency in the order of imports throughout the codebase.
 - [**ruff-pre-commit**](https://github.com/astral-sh/ruff-pre-commit): A fast Python linter and formatter that checks code style and can automatically fix formatting issues.
-- [**blacken-docs**](https://github.com/adamchainz/blacken-docs): A utility that formats Python code within documentation blocks, ensuring that example code snippets in docstrings and markdown files adhere to the same standards as the main codebase.
+- [**blacken-docs**](https://github.com/adamchainz/blacken-docs): A utility that formats Python code within documentation blocks, ensuring that example code snippets in docstring and markdown files adhere to the same standards as the main codebase.
 - [**pre-commit-hooks**](https://github.com/pre-commit/pre-commit-hooks): A collection of various hooks, such as removing trailing whitespace, fixing end-of-line issues, and formatting JSON files.
 
-**Note:** to see the pre-commit configurations please refere to the `./.pre-commit-config.yaml` file
+**Note:** to see the pre-commit configurations please refer to the `./.pre-commit-config.yaml` file
 
 For machine learning, we used: