Skip to content

Commit 336e781

Browse files
committed
Update MA plot instructions, update corresponding figure, add script to regenerate plotting dataframe
1 parent 69b6c43 commit 336e781

File tree

2 files changed

+58
-13
lines changed

2 files changed

+58
-13
lines changed

img/maplot_solution.png

62.4 KB
Loading

lessons/wk7_lesson04_visualizing_results.md

Lines changed: 58 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -326,7 +326,7 @@ res_OE_df_plotting[res_OE_df_plotting$symbol == "HOXA3", "genelabels"] = "HOXA3"
326326

327327
``` r
328328
ggplot(res_OE_df_plotting, aes(x = log2FoldChange, y = -log10(padj))) +
329-
geom_point(aes(colour = threshold_OE)) +
329+
geom_point(aes(colour = significant_OE)) +
330330
geom_text_repel(aes(label = my_genelabels)) +
331331
ggtitle("Mov10 overexpression") +
332332
xlab("log2 fold change") +
@@ -341,7 +341,7 @@ But where is our gene?? Let's actually increase that overlaps parameter...
341341
options(ggrepel.max.overlaps = Inf)
342342

343343
ggplot(res_OE_df_plotting, aes(x = log2FoldChange, y = -log10(padj))) +
344-
geom_point(aes(colour = threshold_OE)) +
344+
geom_point(aes(colour = significant_OE)) +
345345
geom_text_repel(aes(label = my_genelabels)) +
346346
ggtitle("Mov10 overexpression") +
347347
xlab("log2 fold change") +
@@ -360,19 +360,35 @@ Switching from a volcano plot to an MA plot is pretty easy! Let's fill in this s
360360

361361
``` r
362362
ggplot(res_tableOE_plotting, aes(x = , y = )) +
363-
geom_point(aes(colour = threshold_OE)) +
363+
geom_point(aes(colour = significant_OE)) +
364364
geom_text_repel(aes(label = genelabels)) +
365-
ggtitle("Mov10 overexpression") +
365+
ggtitle("MA Plot:Mov10 overexpression") +
366366
xlab("") +
367367
ylab("") +
368-
theme(legend.position = "none",
369-
plot.title = element_text(size = rel(1.5), hjust = 0.5),
368+
theme(plot.title = element_text(size = rel(1.5), hjust = 0.5),
370369
axis.title = element_text(size = rel(1.25)))
371370
```
372371

373372
If we do this correctly, we should have a something like the following - depending on what you have saved as your `genelabels` variable and how big your plot window is:
374373

375-
<img src="../img/mov10_ggplot_maplot.png" width="450"/>
374+
<img src="../img/maplot_solution.png" alt="MA plot with top 10 DE genes labeled for overexpression comparison" width="600"/>
375+
376+
<details>
377+
378+
<summary>Code Solution:</summary>
379+
380+
``` r
381+
ggplot(res_OE_df_plotting, aes(x = log(baseMean) , y = log2FoldChange)) +
382+
geom_point(aes(colour = significant_OE)) +
383+
geom_text_repel(aes(label = genelabels)) +
384+
ggtitle("MA Plot for Mov10 overexpression contrast") +
385+
xlab("log of meant counts") +
386+
ylab("log2FoldChange") +
387+
theme(plot.title = element_text(size = rel(1.5), hjust = 0.5),
388+
axis.title = element_text(size = rel(1.25)))
389+
```
390+
391+
</details>
376392

377393
## **ASSIGNMENT OPTION 2**:
378394

@@ -382,12 +398,12 @@ Take the above steps (starting with finding the Ensembl Gene ID) for a gene symb
382398

383399
------------------------------------------------------------------------
384400

385-
> ### An R package for visualization of DGE results
386-
>
387-
> The Bioconductor package [`DEGreport`](https://bioconductor.org/packages/release/bioc/html/DEGreport.html) can use the DESeq2 results output to make the top20 genes and the volcano plots generated above by writing much fewer lines of code. The caveat of these functions is you lose the ability to customize plots as we have demonstrated above.
388-
>
389-
> If you are interested, the example code below shows how you can use DEGreport to create similar plots. **Note that this is example code, do not run.**
390-
>
401+
### An R package for visualization of DGE results
402+
403+
The Bioconductor package [`DEGreport`](https://bioconductor.org/packages/release/bioc/html/DEGreport.html) can use the DESeq2 results output to make the top20 genes and the volcano plots generated above by writing much fewer lines of code. The caveat of these functions is you lose the ability to customize plots as we have demonstrated above.
404+
405+
If you are interested, the example code below shows how you can use DEGreport to create similar plots. **Note that this is example code, do not run.**
406+
391407
> ``` r
392408
> ## load degreport
393409
> library(degreport)
@@ -403,6 +419,35 @@ Take the above steps (starting with finding the Ensembl Gene ID) for a gene symb
403419
> DEGreport::degPlotWide(dds = dds, genes = row.names(res)[1:5], group = "condition")
404420
> ```
405421
422+
## Summary script from this lesson
423+
424+
Today, we didn't create any new data objects required for the next lesson. However, if you want to save the commands we used to generate the custom data frames for plotting, you can save relevant lines from below:
425+
426+
``` r
427+
## Obtain logical vector where TRUE values denote padj values < 0.05 and fold change > 1.5 in either direction (meaning log2FC >= 0.58)
428+
significant_OE <- res_OE_df$padj < 0.1 & abs(res_OE_df$log2FoldChange) >= 0.58
429+
430+
## Add this vector as a new column to create version of res_OE_df for custom plots
431+
res_OE_df_plotting <- cbind(res_OE_df, significant_OE)
432+
433+
# Create an empty column
434+
res_OE_df_plotting$genelabels <- ""
435+
436+
# Sort by padj values
437+
res_OE_df_plotting <- res_OE_df_plotting[order(res_OE_df_plotting$padj), ]
438+
439+
## Populate the genelabels column with contents of the gene symbols column for the first 10 rows, i.e. the top 10 most significantly expressed genes
440+
res_OE_df_plotting$genelabels[1:10] <- as.character(res_OE_df_plotting$symbol[1:10])
441+
442+
# Create a new empty column
443+
res_OE_df_plotting$my_genelabels <- ""
444+
445+
# Assign value just to that one empty genelabel slot where symbol == HOXA3
446+
res_OE_df_plotting[res_OE_df_plotting$symbol == "HOXA3", "genelabels"] = "HOXA3"
447+
```
448+
449+
You may also want to save the commands for creating the plots themselves!
450+
406451
------------------------------------------------------------------------
407452
408453
*This lesson has been developed by members of the teaching team at the [Harvard Chan Bioinformatics Core (HBC)](http://bioinformatics.sph.harvard.edu/). These are open access materials distributed under the terms of the [Creative Commons Attribution license](https://creativecommons.org/licenses/by/4.0/) (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.*

0 commit comments

Comments
 (0)