@@ -3,10 +3,10 @@ title: "Introduction to Bulk RNAseq data analysis"
33subtitle : Annotation and Visualisation of Differential Expression Results
44date : ' `r format(Sys.time(), "Last modified: %d %b %Y")`'
55output :
6- html_document :
7- toc : yes
86 pdf_document :
97 toc : yes
8+ html_document :
9+ toc : yes
1010bibliography : ref.bib
1111---
1212
@@ -131,7 +131,7 @@ annot <- annotations %>%
131131```
132132
133133
134- ### One-to-many relationships
134+ ### Missing annotations
135135
136136Let's inspect the annotation.
137137
@@ -141,7 +141,6 @@ head(annot)
141141length(annot$entrezid)
142142length(unique(annot$entrezid))
143143sum(is.na(annot$entrezid)) # Why are there NAs in the ENTREZID column?
144-
145144```
146145
147146Gene/transcript/protein IDs mapping between different databases not always
@@ -160,23 +159,24 @@ Ensembl and Entrez databases don't match on a 1:1 level although they have
160159started taking steps towards consolidating
161160[ in recent years] ( https://m.ensembl.org/info/genome/genebuild/mane.html ) .
162161
163- For example [ Prkcg] ( https://www.genecards.org/cgi-bin/carddisp.pl?gene=PRKCG )
164- gene has two Entrez IDs but have one gene name and one EntrezID.
162+ ## One we prepared earlier and one-to-many relationships
165163
166- The is another set of databases within ` AnnotationHub ` which you can call
167- instead called ` OrgDb ` which give you the 'latest' version and are more similar
168- to the bioconductor packages if you are more familiar with those. They contain
169- slightly more information than the ` EnsDb ` records.
170164
171- ## A curated annotation - one we prepared earlier
165+ To ensure everyone is working with same annotation, we have created an annotation table.
172166
173- Dealing with all the one-to-many annotation mappings requires some manual
174- curation of your annotation table.
167+ In this case we used the ` biomaRt ` package to download annotations directly from
168+ Ensembl. In this cases we can get additional columns, but will also sometimes get
169+ one-to-many relationships, where one Ensembl ID maps to multiple Entrez IDs. This
170+ sort of problem is common when mapping between annotation sources, but they have
171+ already been dealt with for us in AnnotationHub. If we wanted more control over
172+ this we would need to manually curate these one-to-many relationships ourselves.
175173
176- To save time we have created an annotation table in which we have modified the
177- column names and dealt with the one-to-many/missing issues for Entrez IDs.
174+ In annotation table below we have modified the column names and dealt with the
175+ one-to-many/missing issues for Entrez IDs. The code we used for doing this is
176+ available in the [ extended materials section] ( S6_Annotation_With_BioMart.html ) .
178177
179- The code we used for doing this is available in the extended materials section.
178+ We will load out pre-created annotation table, and then combine it with our
179+ results table.
180180
181181``` {r addAnnotation, message=FALSE}
182182ensemblAnnot <- readRDS("RObjects/Ensembl_annotations.rds")
@@ -301,11 +301,11 @@ ggplot(volcanoTab.11, aes(x = logFC, y=`-log10(pvalue)`)) +
301301
302302## Exercise 1 - Volcano plot for 33 days
303303
304- Now it's your turn! We just made the volcano plot for the 11 days contrast, you
305- will make the one for the 33 days contrast.
304+ > We just made the volcano plot for the 11 days contrast, you will make the one
305+ > for the 33 days contrast.
306306
307- If you haven't already make sure you load in our data and annotation. You can
308- copy and paste the code below.
307+ > If you haven't already make sure you load in our data and annotation. You can
308+ > copy and paste the code below.
309309
310310``` {r eval=FALSE}
311311# First load data and annotations
@@ -336,18 +336,28 @@ shrinkTab.33 <- as.data.frame(ddsShrink.33) %>%
336336> Create a plot with points coloured by FDR < 0.05 similar to how we did in
337337> the first volcano plot
338338
339- ``` {r echo=FALSE}
339+ ``` {r echo=FALSE, eval=FALSE }
340340volcanoTab.33 <- shrinkTab.33 %>%
341341 mutate(`-log10(pvalue)` = -log10(pvalue))
342342
343343ggplot(volcanoTab.33, aes(x = logFC, y=`-log10(pvalue)`)) +
344344 geom_point(aes(colour=FDR < 0.05), size=1)
345-
346345```
347346
348347> (d)
349348> Compare these two volcano plots, what differences can you see between the two contrasts?
350349
350+
351+ ## Exercise 2 - MA plot for day 33 with ggplot2
352+
353+ > For this exercise create an MA plot for day 33 like the ones we plotted with
354+ > ` plotMA ` from ** DESeq2** but this time using ggplot2.
355+ >
356+ > The x-axis should be the log2 of the mean gene expression across all
357+ > samples, and the y-axis should be the log2 of the fold change between Infected
358+ > and Uninfected.
359+
360+
351361## Venn Diagram
352362
353363In the paper you may notice they have presented a Venn diagram of the results.
@@ -484,8 +494,8 @@ colours of the bars at the top of the heatmap. This is shown below.
484494
485495``` {r ColouredsplitHeatmap, fig.width=5, fig.height=8}
486496ha1 = HeatmapAnnotation(df = colData(ddsObj.interaction)[,c("Status", "TimePoint")],
487- col = list(Status = c("Uninfected" = "hotpink ",
488- "Infected" = "purple "),
497+ col = list(Status = c("Uninfected" = "darkgreen ",
498+ "Infected" = "palegreen "),
489499 TimePoint = c("d11" = "lightblue",
490500 "d33" = "darkblue")))
491501
@@ -498,7 +508,6 @@ Heatmap(z.mat, name = "z-score",
498508```
499509
500510
501-
502511``` {r saveEnvironment, eval=FALSE}
503512saveRDS(annot.interaction.11, file="results/Annotated_Results.d11.rds")
504513saveRDS(shrinkTab.11, file="results/Shrunk_Results.d11.rds")
0 commit comments