|
| 1 | +# Pipeline Information |
| 2 | +This pipeline is the backbone of sPARcRNA_Viz and provides the coordinates required to create the scRNA-seq visualizations. |
| 3 | + |
| 4 | +## Input |
| 5 | +It takes the barcodes, features, and matrix files as inputs. The files need to either be in .csv/.tsv and .mtx format or in an R data format. |
| 6 | + |
| 7 | +## Output |
| 8 | +A json file with all the coordinates of the points in a tSNE that is used by the frontend to visualize it in an interactive way. |
| 9 | + |
| 10 | +## Workflow |
| 11 | +### 1. Setup |
| 12 | +Load libraries, set options, validate and prepare the directories, find and read raw data files, configure based on inputs |
| 13 | +### 2. Create Seurat object |
| 14 | +Seurat is an R package designed for QC, analysis, and exploration of single-cell RNA-seq data. |
| 15 | +- Seurat was chosen because the gene expression data analyzed through this pipeline is single-cell RNA-seq, and it provides ways to normalize, scale, and visualize this data. |
| 16 | +### 3. Normalize and preprocess the data |
| 17 | +Normalize (so that data reflects true biological differences), find variable features, scale (to standardize the data), perform PCA (Principal Component Analysis to reduce dimensionality), cluster cells with similar profiles together |
| 18 | +### 4. t-SNE |
| 19 | +t-SNE allows us to visualize statistically significant genes based on these clusters. From these, researchers can determine potential gene ontologies arising from their sample(s). |
| 20 | +### 5. Differential Gene Expression Analysis |
| 21 | +Differential gene expression analysis takes the normalized gene read counts and allows researchers to determine quantitative changes in gene expression. |
| 22 | +### 6. GSEA |
| 23 | +GSEA, or Gene set enrichment analysis, helps determine the gene groups that are highly represented in the data. |
| 24 | +### 7. Combine t-SNE and GSEA results |
| 25 | +All the cluster results after running GSEA are saved, and the top pathways are saved as well. |
| 26 | +### 8. Export and Display Results |
| 27 | +All values from the previous steps and top clusters, pathways, etc are saved in a json file that is later visualized |
| 28 | + |
| 29 | +## Overview of Functions |
| 30 | +- `make_options()`: allows for user input through command line, allows to input data files from local machine |
| 31 | +- `Read_MTX()`: reads the data from barcodes, features, and matrix files after patterns have been made and properly found from the input files given by the user |
| 32 | +- `CreateSeuratObject()`: Seurat object created from data saved and user inputs on the name, cells, and features |
| 33 | +- Cleaning the data and making it standardized so that it can be used for a tSNE and GSEA: |
| 34 | + - `NormalizeData()` |
| 35 | + - `ScaleData()` |
| 36 | +- Reducing the dimensionality, clustering, and running the tSNE and saving it: |
| 37 | + - `RunPCA()` |
| 38 | + - `FindNeighbors()` |
| 39 | + - `FindClusters()` |
| 40 | + - `RunTSNE()` |
| 41 | + - `DimPlot()` |
| 42 | + - `ggsave()` |
| 43 | +- `FindAllMarkers()`: performs the differential expression analysis |
| 44 | +- `GetAssayData()`: saves the normalized gene expression data, which makes sure that the data is not due to technical biases |
| 45 | +- tSNE coordinates, top 10 markers, top pathways, cluster results, cluster centroids, cluster average expression data, and more are saved and exported as a json file |
0 commit comments