clustifyR/README.Rmd at master · agillen/clustifyR · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
---
output: github_document
---

```{r, echo = FALSE, message = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>")
```

[![Build Status](https://travis-ci.org/NCBI-Hackathons/clustifyR.svg?branch=master)](https://travis-ci.org/NCBI-Hackathons/clustifyR)

<p align="center">
  <img src="https://raw.githubusercontent.com/NCBI-Hackathons/clustifyR/master/inst/logo/logo_transparent.png" width = 405 height = 174>
</p>

### ClustifyR is an [R](https://www.r-project.org/) package that classifies cells and clusters in single-cell RNA sequencing experiments using reference bulk RNA-seq data sets, gene signatures or marker genes.

Single cell transcriptomes are difficult to annotate without extensive knowledge of the underlying biology of the system in question. Even with this knowledge, accurate identification can be challenging due to the lack of detectable expression of common marker genes defined by bulk RNA-seq, flow cytometry, etc. `ClustifyR` solves this problem by providing functions to automatically annotate single cells or clusters using bulk RNA-seq data or marker gene lists (ranked or unranked). Additional functions allow for exploratory analysis of similarities between single cell RNA-seq datasets and reference data. Put another way:

**C**lustifyR **L**everages **U**ser **S**upplied **T**ranscripts to **I**dentify **F**eatures in **Y**our sc**R**NA-seq

## Installation
Installation from github in R is a two step process:

### Step 1:
```r
# Install devtools
install.packages("devtools")
```

### Step 2:
```r
# Install clustifyR from github
devtools::install_github("NCBI-Hackathons/clustifyR")
```

## Usage

### Super-quickstart with sample data (included in package):

Generate a correlation matrix from a matrix of single cell RNA-seq data (`pbmc4k_matrix`), a metadata table describing the single cell data (`pbmc4k_meta`), a list of variable genes in the single cell data (`pbmc4k_vargenes`), and a matrix of bulk RNA-seq read counts (`pbmc_bulk_matrix`):

```r
# run correlation (pearson by default)
res <- run_cor(expr_mat = pbmc4k_matrix,
               metadata = pbmc4k_meta,
               bulk_mat = pbmc_bulk_matrix,
               query_gene_list = pbmc4k_vargenes,
               compute_method = corr_coef)
```

Plot the correlation coefficients on a pre-calculated tSNE projection (stored in `pbmc4k_meta`):

```r
# plot correlation coefficients on tSNE for each identity class
plot_cor(res,
         pbmc4k_meta,
         colnames(res)[c(1, 5)],
         cluster_col = "classified")
```

### For more detail, see a list of [command refences](https://ncbi-hackathons.github.io/clustifyR/reference/index.html) or [browse the vignettes](https://ncbi-hackathons.github.io/clustifyR/articles/).