Skip to content

Commit 81d9719

Browse files
committed
Add solutions file
1 parent 71dcd05 commit 81d9719

File tree

1 file changed

+196
-0
lines changed

1 file changed

+196
-0
lines changed

vignettes/solutions.Rmd

Lines changed: 196 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,196 @@
1+
---
2+
title: "Bioc 2020 Tidytranscriptomics Solutions"
3+
output: rmarkdown::html_vignette
4+
vignette: >
5+
%\VignetteIndexEntry{Solutions}
6+
%\VignetteEngine{knitr::rmarkdown}
7+
%\VignetteEncoding{UTF-8}
8+
---
9+
10+
Questions:
11+
1. What is the Fraction of Variance for PC1 and PC2? What do PC1 and PC2 represent?
12+
2. How many DE genes are there for treated vs untreated? What is the top DE gene by P value?
13+
3. What code can generate a heatmap of variable genes (starting from count_scaled)?
14+
4. What code can you use to visualise expression of the pasilla gene (gene id: FBgn0261552)
15+
5. What code can generate an interactive volcano plot that has gene symbols showing on hover?
16+
6. What code can generate a heatmap of the top 100 DE genes?
17+
18+
Suggested answers are below. You might have some different code e.g. to customise the volcano plot as you like. Feel free to comment on any of these solutions in the workshop website as described [here](https://github.com/stemangiola/bioc_2020_tidytranscriptomics/blob/master/CONTRIBUTING.md).
19+
20+
```{r out.width = "40%", message=FALSE, warning=FALSE}
21+
# load libraries
22+
23+
# tidyverse core packages
24+
library(tibble)
25+
library(dplyr)
26+
library(tidyr)
27+
library(readr)
28+
library(stringr)
29+
library(ggplot2)
30+
31+
# tidyverse-friendly packages
32+
library(tidyHeatmap)
33+
library(tidybulk)
34+
library(ggrepel)
35+
library(plotly)
36+
37+
# load data
38+
data("pasilla", package = "bioc2020tidytranscriptomics")
39+
40+
# create tidybulk tibble
41+
counts_tt <- pasilla %>%
42+
tidybulk()
43+
44+
# scale counts
45+
counts_scaled <- counts_tt %>% scale_abundance(factor_of_interest = condition)
46+
47+
# create density plots
48+
counts_scaled %>%
49+
filter(!lowly_abundant) %>%
50+
pivot_longer(cols = c("counts", "counts_scaled"), names_to = "source", values_to = "abundance") %>%
51+
ggplot(aes(x=abundance + 1, group=sample, color=condition)) +
52+
geom_density() +
53+
facet_wrap(~source) +
54+
scale_x_log10() +
55+
theme_bw()
56+
```
57+
58+
1. What is the Fraction of Variance for PC1 and PC2?
59+
60+
```{r}
61+
counts_scal_PCA <-
62+
counts_scaled %>%
63+
reduce_dimensions(method="PCA")
64+
```
65+
66+
Answer: PC1: 47%, PC2: 25%
67+
68+
What do PC1 and PC2 represent?
69+
70+
```{r out.width = "40%"}
71+
counts_scal_PCA %>%
72+
pivot_sample() %>%
73+
ggplot(aes(x=PC1, y=PC2, colour=condition, shape=type)) +
74+
geom_point() +
75+
geom_text_repel(aes(label=sample), show.legend = FALSE) +
76+
theme_bw()
77+
```
78+
79+
Answer: PC1 represents variance due to treatment effect(treated vs untreated). PC2 represents variance due to sequencing type single vs paired.
80+
81+
82+
```{r}
83+
counts_de <-
84+
counts_tt %>%
85+
test_differential_abundance(.formula = ~ 0 + condition + type,
86+
.contrasts = c("conditiontreated - conditionuntreated"),
87+
omit_contrast_in_colnames = TRUE)
88+
```
89+
90+
2. How many DE genes are there for treated vs untreated (FDR < 0.05)?
91+
92+
```{r}
93+
counts_de %>%
94+
filter(significant == TRUE) %>%
95+
summarise(num_de = n_distinct(feature))
96+
```
97+
98+
Answer: 1128
99+
100+
What is the top DE gene by P value?
101+
102+
```{r}
103+
topgenes <- counts_de %>%
104+
pivot_transcript() %>%
105+
arrange(PValue) %>%
106+
head(6)
107+
108+
topgenes
109+
```
110+
111+
Answer: FBgn0025111
112+
113+
114+
3. What code can generate a heatmap of variable genes (starting from count_scaled)?
115+
116+
```{r out.width = "40%"}
117+
counts_scaled %>%
118+
119+
# filter lowly abundant
120+
filter(!lowly_abundant) %>%
121+
122+
# extract 500 most variable genes
123+
keep_variable( .abundance = counts_scaled, top = 500) %>%
124+
125+
# create heatmap
126+
heatmap(
127+
.column = sample,
128+
.row = feature,
129+
.value = counts_scaled,
130+
annotation = c(condition, type),
131+
transform = log1p
132+
)
133+
```
134+
135+
4. What code can you use to visualise expression of the pasilla gene (gene id: FBgn0261552)
136+
137+
```{r out.width = "40%"}
138+
counts_scaled %>%
139+
140+
# extract counts for pasilla gene
141+
filter(feature == "FBgn0261552") %>%
142+
143+
# make stripchart
144+
ggplot(aes(x = condition, y = counts_scaled + 1, fill =condition, label = sample)) +
145+
geom_boxplot() +
146+
geom_jitter() +
147+
scale_y_log10()+
148+
theme_bw()
149+
```
150+
151+
5. What code can generate an interactive volcano plot that has gene ids showing on hover?
152+
153+
```{r out.width = "40%"}
154+
p <- counts_de %>%
155+
pivot_transcript() %>%
156+
157+
# Subset data
158+
filter(!lowly_abundant) %>%
159+
mutate(significant = FDR<0.05 & abs(logFC) >=2) %>%
160+
161+
# Plot
162+
ggplot(aes(x = logFC, y = PValue, label=feature)) +
163+
geom_point(aes(color = significant, size = significant, alpha=significant)) +
164+
geom_text_repel() +
165+
166+
# Custom scales
167+
scale_y_continuous(trans = "log10_reverse") +
168+
scale_color_manual(values=c("black", "#e11f28")) +
169+
scale_size_discrete(range = c(0, 2)) +
170+
theme_bw()
171+
172+
ggplotly(p, tooltip = c("text"))
173+
```
174+
Tip: You can use "text" instead of "label" if you don't want the column name to show up in the hover e.g. above will give "FBgn0261552" rather than "feature:FBgn0261552".
175+
176+
177+
178+
6. What code can generate a heatmap of the top 100 DE genes?
179+
180+
```{r out.width = "40%"}
181+
top100 <-
182+
counts_de %>%
183+
pivot_transcript() %>%
184+
arrange(PValue) %>%
185+
head(100)
186+
187+
counts_scaled %>%
188+
filter(feature %in% top100$feature) %>%
189+
heatmap(
190+
.column = sample,
191+
.row = feature,
192+
.value = counts_scaled,
193+
annotation = c(condition, type),
194+
transform = log1p
195+
)
196+
```

0 commit comments

Comments
 (0)