Skip to content

Commit ed99085

Browse files
committed
Some updates to text to account for o-t-m relationships in annot. MA exercise
1 parent 9a96c1b commit ed99085

6 files changed

+120
-69
lines changed

Markdowns/11_Annotation_and_Visualisation.Rmd

Lines changed: 34 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -3,10 +3,10 @@ title: "Introduction to Bulk RNAseq data analysis"
33
subtitle: Annotation and Visualisation of Differential Expression Results
44
date: '`r format(Sys.time(), "Last modified: %d %b %Y")`'
55
output:
6-
html_document:
7-
toc: yes
86
pdf_document:
97
toc: yes
8+
html_document:
9+
toc: yes
1010
bibliography: ref.bib
1111
---
1212

@@ -131,7 +131,7 @@ annot <- annotations %>%
131131
```
132132

133133

134-
### One-to-many relationships
134+
### Missing annotations
135135

136136
Let's inspect the annotation.
137137

@@ -141,7 +141,6 @@ head(annot)
141141
length(annot$entrezid)
142142
length(unique(annot$entrezid))
143143
sum(is.na(annot$entrezid)) # Why are there NAs in the ENTREZID column?
144-
145144
```
146145

147146
Gene/transcript/protein IDs mapping between different databases not always
@@ -160,23 +159,24 @@ Ensembl and Entrez databases don't match on a 1:1 level although they have
160159
started taking steps towards consolidating
161160
[in recent years](https://m.ensembl.org/info/genome/genebuild/mane.html).
162161

163-
For example [Prkcg](https://www.genecards.org/cgi-bin/carddisp.pl?gene=PRKCG)
164-
gene has two Entrez IDs but have one gene name and one EntrezID.
162+
## One we prepared earlier and one-to-many relationships
165163

166-
The is another set of databases within `AnnotationHub` which you can call
167-
instead called `OrgDb` which give you the 'latest' version and are more similar
168-
to the bioconductor packages if you are more familiar with those. They contain
169-
slightly more information than the `EnsDb` records.
170164

171-
## A curated annotation - one we prepared earlier
165+
To ensure everyone is working with same annotation, we have created an annotation table.
172166

173-
Dealing with all the one-to-many annotation mappings requires some manual
174-
curation of your annotation table.
167+
In this case we used the `biomaRt` package to download annotations directly from
168+
Ensembl. In this cases we can get additional columns, but will also sometimes get
169+
one-to-many relationships, where one Ensembl ID maps to multiple Entrez IDs. This
170+
sort of problem is common when mapping between annotation sources, but they have
171+
already been dealt with for us in AnnotationHub. If we wanted more control over
172+
this we would need to manually curate these one-to-many relationships ourselves.
175173

176-
To save time we have created an annotation table in which we have modified the
177-
column names and dealt with the one-to-many/missing issues for Entrez IDs.
174+
In annotation table below we have modified the column names and dealt with the
175+
one-to-many/missing issues for Entrez IDs. The code we used for doing this is
176+
available in the [extended materials section](S6_Annotation_With_BioMart.html).
178177

179-
The code we used for doing this is available in the extended materials section.
178+
We will load out pre-created annotation table, and then combine it with our
179+
results table.
180180

181181
```{r addAnnotation, message=FALSE}
182182
ensemblAnnot <- readRDS("RObjects/Ensembl_annotations.rds")
@@ -301,11 +301,11 @@ ggplot(volcanoTab.11, aes(x = logFC, y=`-log10(pvalue)`)) +
301301

302302
## Exercise 1 - Volcano plot for 33 days
303303

304-
Now it's your turn! We just made the volcano plot for the 11 days contrast, you
305-
will make the one for the 33 days contrast.
304+
> We just made the volcano plot for the 11 days contrast, you will make the one
305+
> for the 33 days contrast.
306306
307-
If you haven't already make sure you load in our data and annotation. You can
308-
copy and paste the code below.
307+
> If you haven't already make sure you load in our data and annotation. You can
308+
> copy and paste the code below.
309309
310310
```{r eval=FALSE}
311311
# First load data and annotations
@@ -336,18 +336,28 @@ shrinkTab.33 <- as.data.frame(ddsShrink.33) %>%
336336
> Create a plot with points coloured by FDR < 0.05 similar to how we did in
337337
> the first volcano plot
338338
339-
```{r echo=FALSE}
339+
```{r echo=FALSE, eval=FALSE}
340340
volcanoTab.33 <- shrinkTab.33 %>%
341341
mutate(`-log10(pvalue)` = -log10(pvalue))
342342
343343
ggplot(volcanoTab.33, aes(x = logFC, y=`-log10(pvalue)`)) +
344344
geom_point(aes(colour=FDR < 0.05), size=1)
345-
346345
```
347346

348347
> (d)
349348
> Compare these two volcano plots, what differences can you see between the two contrasts?
350349
350+
351+
## Exercise 2 - MA plot for day 33 with ggplot2
352+
353+
> For this exercise create an MA plot for day 33 like the ones we plotted with
354+
> `plotMA` from **DESeq2** but this time using ggplot2.
355+
>
356+
> The x-axis should be the log2 of the mean gene expression across all
357+
> samples, and the y-axis should be the log2 of the fold change between Infected
358+
> and Uninfected.
359+
360+
351361
## Venn Diagram
352362

353363
In the paper you may notice they have presented a Venn diagram of the results.
@@ -484,8 +494,8 @@ colours of the bars at the top of the heatmap. This is shown below.
484494

485495
```{r ColouredsplitHeatmap, fig.width=5, fig.height=8}
486496
ha1 = HeatmapAnnotation(df = colData(ddsObj.interaction)[,c("Status", "TimePoint")],
487-
col = list(Status = c("Uninfected" = "hotpink",
488-
"Infected" = "purple"),
497+
col = list(Status = c("Uninfected" = "darkgreen",
498+
"Infected" = "palegreen"),
489499
TimePoint = c("d11" = "lightblue",
490500
"d33" = "darkblue")))
491501
@@ -498,7 +508,6 @@ Heatmap(z.mat, name = "z-score",
498508
```
499509

500510

501-
502511
```{r saveEnvironment, eval=FALSE}
503512
saveRDS(annot.interaction.11, file="results/Annotated_Results.d11.rds")
504513
saveRDS(shrinkTab.11, file="results/Shrunk_Results.d11.rds")

Markdowns/11_Annotation_and_Visualisation.html

Lines changed: 47 additions & 38 deletions
Large diffs are not rendered by default.
-985 KB
Binary file not shown.

Markdowns/11_Annotation_and_Visualisation_Solutions.Rmd

Lines changed: 21 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -59,11 +59,30 @@ shrinkTab.33 <- as.data.frame(ddsShrink.33) %>%
5959
> Create a plot with points coloured by P-value < 0.05 similar to how we did in
6060
> the first volcano plot
6161
62-
```{r plot}
62+
```{r plotVol}
6363
volcanoTab.33 <- shrinkTab.33 %>%
6464
mutate(`-log10(pvalue)` = -log10(pvalue))
6565
6666
ggplot(volcanoTab.33, aes(x = logFC, y=`-log10(pvalue)`)) +
67-
geom_point(aes(colour=pvalue < 0.05), size=1)
67+
geom_point(aes(colour=FDR < 0.05), size=1)
6868
69+
```
70+
71+
72+
## Exercise 2 - MA plot for day 33 with ggplot2
73+
74+
> For this exercise create an MA plot for day 33 like the ones we plotted with
75+
> `plotMA` from **DESeq2** but this time using ggplot2.
76+
>
77+
> The x-axis (M) should be the log2 of the mean gene expression across all
78+
> samples, and the y-axis should be the log2 of the fold change between Infected
79+
> and Uninfected.
80+
81+
```{r plotMA}
82+
maTab.33 <- shrinkTab.33 %>%
83+
mutate(`M` = log2(baseMean))
84+
85+
ggplot(maTab.33, aes(x = M, y = logFC)) +
86+
geom_point(aes(colour=FDR < 0.05), size=1) +
87+
scale_y_continuous(limit=c(-4,4), oob = scales::squish)
6988
```

Markdowns/11_Annotation_and_Visualisation_Solutions.html

Lines changed: 18 additions & 4 deletions
Large diffs are not rendered by default.
1.14 MB
Binary file not shown.

0 commit comments

Comments
 (0)