Skip to content

Commit fb3b705

Browse files
Merge pull request #11 from SPARC-FAIR-Codeathon/develop
README
2 parents 1f9ea6e + 7b7db13 commit fb3b705

File tree

2 files changed

+250
-39
lines changed

2 files changed

+250
-39
lines changed

PIPELINE.md

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
# Pipeline Information
2+
This pipeline is the backbone of sPARcRNA_Viz and provides the coordinates required to create the scRNA-seq visualizations.
3+
4+
## Input
5+
It takes the barcodes, features, and matrix files as inputs. The files need to either be in .csv/.tsv and .mtx format or in an R data format.
6+
7+
## Output
8+
A json file with all the coordinates of the points in a tSNE that is used by the frontend to visualize it in an interactive way.
9+
10+
## Workflow
11+
### 1. Setup
12+
Load libraries, set options, validate and prepare the directories, find and read raw data files, configure based on inputs
13+
### 2. Create Seurat object
14+
Seurat is an R package designed for QC, analysis, and exploration of single-cell RNA-seq data.
15+
- Seurat was chosen because the gene expression data analyzed through this pipeline is single-cell RNA-seq, and it provides ways to normalize, scale, and visualize this data.
16+
### 3. Normalize and preprocess the data
17+
Normalize (so that data reflects true biological differences), find variable features, scale (to standardize the data), perform PCA (Principal Component Analysis to reduce dimensionality), cluster cells with similar profiles together
18+
### 4. t-SNE
19+
t-SNE allows us to visualize statistically significant genes based on these clusters. From these, researchers can determine potential gene ontologies arising from their sample(s).
20+
### 5. Differential Gene Expression Analysis
21+
Differential gene expression analysis takes the normalized gene read counts and allows researchers to determine quantitative changes in gene expression.
22+
### 6. GSEA
23+
GSEA, or Gene set enrichment analysis, helps determine the gene groups that are highly represented in the data.
24+
### 7. Combine t-SNE and GSEA results
25+
All the cluster results after running GSEA are saved, and the top pathways are saved as well.
26+
### 8. Export and Display Results
27+
All values from the previous steps and top clusters, pathways, etc are saved in a json file that is later visualized
28+
29+
## Overview of Functions
30+
- `make_options()`: allows for user input through command line, allows to input data files from local machine
31+
- `Read_MTX()`: reads the data from barcodes, features, and matrix files after patterns have been made and properly found from the input files given by the user
32+
- `CreateSeuratObject()`: Seurat object created from data saved and user inputs on the name, cells, and features
33+
- Cleaning the data and making it standardized so that it can be used for a tSNE and GSEA:
34+
- `NormalizeData()`
35+
- `ScaleData()`
36+
- Reducing the dimensionality, clustering, and running the tSNE and saving it:
37+
- `RunPCA()`
38+
- `FindNeighbors()`
39+
- `FindClusters()`
40+
- `RunTSNE()`
41+
- `DimPlot()`
42+
- `ggsave()`
43+
- `FindAllMarkers()`: performs the differential expression analysis
44+
- `GetAssayData()`: saves the normalized gene expression data, which makes sure that the data is not due to technical biases
45+
- tSNE coordinates, top 10 markers, top pathways, cluster results, cluster centroids, cluster average expression data, and more are saved and exported as a json file

0 commit comments

Comments
 (0)