You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+10-11Lines changed: 10 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,32 +3,31 @@
3
3
This repository contains the code used to perform the [single-cell proteogenomic analysis of the human cell cycle](https://www.nature.com/articles/s41586-021-03232-9). This study was based on immunofluorescence staining of ~200k cells for single-cell analysis of proteomic heterogeneity and ~1k cells for analysis of single-cell RNA variability. These analyses were integrated with other proteomic studies and databases to investigate the functional importance of transcript-regulated and non-transcript regulated variability.
4
4
5
5
## Structure of repository
6
-
The code is listed in order of execution, e.g. "1_", "2_" etc. The output of each script is used in the subsequent script.
6
+
The code is listed in order of execution, e.g. "1_", "2_" etc. The output of each script is used in the subsequent script. This workflow can also be run using snakemake (see below).
7
7
8
8
The logic for these analyses is contained in the `SingleCellProteogenomics` folder.
9
9
10
-
The input files are contained in the "input" folder. This folder is linked [here](https://drive.google.com/file/d/1mdQbYcDPqiTOHeiYbv_4RtrxrmlhYMNl/view?usp=sharing) as a zip file, `input.zip`. Expand this folder within the base directory of this repository. If you are looking for the raw imaging proteomic dataset produced after filtering artifacts and such, that is located [here](https://drive.google.com/file/d/11vjsZV-nmzPpFmA7ShbfHzmbrk057b1V/view?usp=sharing).
10
+
The input files are contained in the "input" folder. This folder is linked [here](https://drive.google.com/file/d/1G4i115FCH8XNyiEHCkBXMSO_9pwGflTq/view?usp=sharing) as a zip file, `input.zip`. Expand this folder within the base directory of this repository. If you are looking for the raw imaging proteomic dataset produced after filtering artifacts and such, that is located [here](https://drive.google.com/file/d/11vjsZV-nmzPpFmA7ShbfHzmbrk057b1V/view?usp=sharing).
11
11
12
12
The output files are added to a folder "output" during the analysis, and figures are added to a folder "figures."
13
13
14
14
An R-script used to analyze skewness and kurtosis (noted in the Methods of the manuscript) is contained in the other_scripts folder. The `other_scripts/ProteinDisorder.py` script utilizes [IUPRED2A](https://iupred2a.elte.hu/) and a [human UniProt](https://www.uniprot.org/proteomes/UP000005640) database.
15
15
16
-
## Prerequisites
16
+
## Running the workflow using snakemake
17
17
18
-
Prerequisites are listed in `enviro.yaml` file. They can be installed using `conda`:
18
+
This workflow can be run using `snakemake`:
19
19
20
-
1. Install Miniconda from https://docs.conda.io/en/latest/miniconda.html
20
+
1. Install Miniconda from https://docs.conda.io/en/latest/miniconda.html.
21
21
22
-
2. From this directory run `conda env create -n fucci -f enviro.yaml; conda activate fucci` to install and activate these prerequisites.
22
+
2. Install snakemake using `conda install -c conda-forge snakemake-minimal`.
23
+
24
+
3. Within this directory, run `snakemake -j 1 --use-conda --snakefile workflow/Snakefile`.
23
25
24
26
## Single-cell RNA-Seq analysis
25
27
26
-
The single-cell RNA-Seq data is available at GEO SRA under project number [GSE146773](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE146773).
28
+
The single-cell RNA-Seq data is available at GEO SRA under project number [GSE146773](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE146773).
27
29
28
-
The cell cycle phase and FACS intensity data for these ~1,000 cells are contained in the [input folder](https://drive.google.com/file/d/1mdQbYcDPqiTOHeiYbv_4RtrxrmlhYMNl/view?usp=sharing) within the file `ProteinData/WellPlatePhasesLogNormIntensities.csv`:
29
-
* The column "Well_Plate" (e.g., A10_355) corresponds to the sample title within GEO SRA (e.g., "Single U2OS cell A10_355").
30
-
* The column "Stage" corresponds to the phase assigned by FACS gating.
31
-
* The "Green530" and "Red585" columns correspond to the log-intensities for the red (CDT1) and green (GMNN) FUCCI markers for the individual cells.
30
+
The cell cycle phase and FACS intensity information for these ~1,000 cells are contained in the [input folder](https://drive.google.com/file/d/1G4i115FCH8XNyiEHCkBXMSO_9pwGflTq/view?usp=sharing) within three files, one per plate, starting with `RNAData/180911_Fucci_single cell seq_ss2-18-*.csv`.
32
31
33
32
The `snakemake` workflow used to analyze the scRNA-Seq dataset, including RNA velocity calculations and louvain unsupervised clustering, can be found in this repository: https://github.com/CellProfiling/FucciSingleCellSeqPipeline.
0 commit comments