Skip to content

Commit 42d3de0

Browse files
authored
Merge pull request #120 from NICHD-BSPC/sally-edits-may2025
Wk 7 Lesson 04: Update plots, add catch-up script for plotting
2 parents 5a118bb + 336e781 commit 42d3de0

File tree

3 files changed

+75
-30
lines changed

3 files changed

+75
-30
lines changed

img/maplot_solution.png

62.4 KB
Loading

img/volcano_top10_labeled.png

104 KB
Loading

lessons/wk7_lesson04_visualizing_results.md

Lines changed: 75 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -306,31 +306,32 @@ ggplot(res_OE_df_plotting, aes(x = log2FoldChange, y = -log10(padj))) +
306306

307307
### Selecting Your Own Gene
308308

309-
What if you want to select your own gene of interest? One way is creating a new data frame where you have added a value in the `genelabels` column just for that one symbol.
309+
What if you want to select your own gene of interest? One way is creating a new data frame where you have added a value in the `genelabels` column just for that one symbol. In this example, I am using a gene symbol called `HOX3A`.
310310

311-
``` r
312-
res_tableOE_df[res_tableOE_tb$symbol == "Mygene", "genelabels"]
311+
Preparing for this will take a few steps in Base R:
312+
313+
- Add an additional empty column called `my_genelabels` to our current plotting dataframe, to put those gene names we want to use to label the plot
313314

314-
res_tableOE_mygene <- res_tableOE_df %>%
315-
dplyr::mutate(threshold_OE = padj < 0.1 & abs(log2FoldChange) >= 0.58)
315+
- Assign value just to the one relevant empty `my_genelabel` slot! You can use a strategy like this in other data frames and with other genes - or whole custom lists of genes.
316316

317-
res_tableOE_mygene <- res_tableOE_mygene %>% dplyr::mutate(genelabels = "")
317+
``` r
318+
# Create a new empty column
319+
res_OE_df_plotting$my_genelabels <- ""
318320

319-
# Assign value just to that one empty genelabel slot! You can use a strategy like this in other data frames and with other genes - or whole custom lists of genes.
320-
res_tableOE_mygene[res_tableOE_mygene$symbol == "HOXA3", "genelabels"] = "HOXA3"
321+
# Assign value just to that one empty genelabel slot where symbol == HOXA3
322+
res_OE_df_plotting[res_OE_df_plotting$symbol == "HOXA3", "genelabels"] = "HOXA3"
321323
```
322324

323325
**Now let's check this out in a volcano plot:**
324326

325327
``` r
326-
ggplot(res_tableOE_mygene, aes(x = log2FoldChange, y = -log10(padj))) +
327-
geom_point(aes(colour = threshold_OE)) +
328-
geom_text_repel(aes(label = genelabels)) +
328+
ggplot(res_OE_df_plotting, aes(x = log2FoldChange, y = -log10(padj))) +
329+
geom_point(aes(colour = significant_OE)) +
330+
geom_text_repel(aes(label = my_genelabels)) +
329331
ggtitle("Mov10 overexpression") +
330332
xlab("log2 fold change") +
331333
ylab("-log10 adjusted p-value") +
332-
theme(legend.position = "none",
333-
plot.title = element_text(size = rel(1.5), hjust = 0.5),
334+
theme(plot.title = element_text(size = rel(1.5), hjust = 0.5),
334335
axis.title = element_text(size = rel(1.25)))
335336
```
336337

@@ -339,15 +340,14 @@ But where is our gene?? Let's actually increase that overlaps parameter...
339340
``` r
340341
options(ggrepel.max.overlaps = Inf)
341342

342-
ggplot(res_tableOE_mygene, aes(x = log2FoldChange, y = -log10(padj))) +
343-
geom_point(aes(colour = threshold_OE)) +
344-
geom_text_repel(aes(label = genelabels)) +
343+
ggplot(res_OE_df_plotting, aes(x = log2FoldChange, y = -log10(padj))) +
344+
geom_point(aes(colour = significant_OE)) +
345+
geom_text_repel(aes(label = my_genelabels)) +
345346
ggtitle("Mov10 overexpression") +
346347
xlab("log2 fold change") +
347348
ylab("-log10 adjusted p-value") +
348-
theme(legend.position = "none",
349-
plot.title = element_text(size = rel(1.5), hjust = 0.5),
350-
axis.title = element_text(size = rel(1.25)))
349+
theme(plot.title = element_text(size = rel(1.5), hjust = 0.5),
350+
axis.title = element_text(size = rel(1.25)))
351351
```
352352

353353
<img src="../img/hox3a_labeled_volcano.png" width="500"/>
@@ -360,19 +360,35 @@ Switching from a volcano plot to an MA plot is pretty easy! Let's fill in this s
360360

361361
``` r
362362
ggplot(res_tableOE_plotting, aes(x = , y = )) +
363-
geom_point(aes(colour = threshold_OE)) +
363+
geom_point(aes(colour = significant_OE)) +
364364
geom_text_repel(aes(label = genelabels)) +
365-
ggtitle("Mov10 overexpression") +
365+
ggtitle("MA Plot:Mov10 overexpression") +
366366
xlab("") +
367367
ylab("") +
368-
theme(legend.position = "none",
369-
plot.title = element_text(size = rel(1.5), hjust = 0.5),
368+
theme(plot.title = element_text(size = rel(1.5), hjust = 0.5),
370369
axis.title = element_text(size = rel(1.25)))
371370
```
372371

373372
If we do this correctly, we should have a something like the following - depending on what you have saved as your `genelabels` variable and how big your plot window is:
374373

375-
<img src="../img/mov10_ggplot_maplot.png" width="450"/>
374+
<img src="../img/maplot_solution.png" alt="MA plot with top 10 DE genes labeled for overexpression comparison" width="600"/>
375+
376+
<details>
377+
378+
<summary>Code Solution:</summary>
379+
380+
``` r
381+
ggplot(res_OE_df_plotting, aes(x = log(baseMean) , y = log2FoldChange)) +
382+
geom_point(aes(colour = significant_OE)) +
383+
geom_text_repel(aes(label = genelabels)) +
384+
ggtitle("MA Plot for Mov10 overexpression contrast") +
385+
xlab("log of meant counts") +
386+
ylab("log2FoldChange") +
387+
theme(plot.title = element_text(size = rel(1.5), hjust = 0.5),
388+
axis.title = element_text(size = rel(1.25)))
389+
```
390+
391+
</details>
376392

377393
## **ASSIGNMENT OPTION 2**:
378394

@@ -382,12 +398,12 @@ Take the above steps (starting with finding the Ensembl Gene ID) for a gene symb
382398

383399
------------------------------------------------------------------------
384400

385-
> ### An R package for visualization of DGE results
386-
>
387-
> The Bioconductor package [`DEGreport`](https://bioconductor.org/packages/release/bioc/html/DEGreport.html) can use the DESeq2 results output to make the top20 genes and the volcano plots generated above by writing much fewer lines of code. The caveat of these functions is you lose the ability to customize plots as we have demonstrated above.
388-
>
389-
> If you are interested, the example code below shows how you can use DEGreport to create similar plots. **Note that this is example code, do not run.**
390-
>
401+
### An R package for visualization of DGE results
402+
403+
The Bioconductor package [`DEGreport`](https://bioconductor.org/packages/release/bioc/html/DEGreport.html) can use the DESeq2 results output to make the top20 genes and the volcano plots generated above by writing much fewer lines of code. The caveat of these functions is you lose the ability to customize plots as we have demonstrated above.
404+
405+
If you are interested, the example code below shows how you can use DEGreport to create similar plots. **Note that this is example code, do not run.**
406+
391407
> ``` r
392408
> ## load degreport
393409
> library(degreport)
@@ -403,6 +419,35 @@ Take the above steps (starting with finding the Ensembl Gene ID) for a gene symb
403419
> DEGreport::degPlotWide(dds = dds, genes = row.names(res)[1:5], group = "condition")
404420
> ```
405421
422+
## Summary script from this lesson
423+
424+
Today, we didn't create any new data objects required for the next lesson. However, if you want to save the commands we used to generate the custom data frames for plotting, you can save relevant lines from below:
425+
426+
``` r
427+
## Obtain logical vector where TRUE values denote padj values < 0.05 and fold change > 1.5 in either direction (meaning log2FC >= 0.58)
428+
significant_OE <- res_OE_df$padj < 0.1 & abs(res_OE_df$log2FoldChange) >= 0.58
429+
430+
## Add this vector as a new column to create version of res_OE_df for custom plots
431+
res_OE_df_plotting <- cbind(res_OE_df, significant_OE)
432+
433+
# Create an empty column
434+
res_OE_df_plotting$genelabels <- ""
435+
436+
# Sort by padj values
437+
res_OE_df_plotting <- res_OE_df_plotting[order(res_OE_df_plotting$padj), ]
438+
439+
## Populate the genelabels column with contents of the gene symbols column for the first 10 rows, i.e. the top 10 most significantly expressed genes
440+
res_OE_df_plotting$genelabels[1:10] <- as.character(res_OE_df_plotting$symbol[1:10])
441+
442+
# Create a new empty column
443+
res_OE_df_plotting$my_genelabels <- ""
444+
445+
# Assign value just to that one empty genelabel slot where symbol == HOXA3
446+
res_OE_df_plotting[res_OE_df_plotting$symbol == "HOXA3", "genelabels"] = "HOXA3"
447+
```
448+
449+
You may also want to save the commands for creating the plots themselves!
450+
406451
------------------------------------------------------------------------
407452
408453
*This lesson has been developed by members of the teaching team at the [Harvard Chan Bioinformatics Core (HBC)](http://bioinformatics.sph.harvard.edu/). These are open access materials distributed under the terms of the [Creative Commons Attribution license](https://creativecommons.org/licenses/by/4.0/) (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.*

0 commit comments

Comments
 (0)