Skip to content

Commit 9d74d26

Browse files
committed
Final tweaks
1 parent 00e387d commit 9d74d26

File tree

7 files changed

+206
-95
lines changed

7 files changed

+206
-95
lines changed

13-webgestaltr.Rmd

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ This tool (both the web and R versions) has many features and advantages:
2626

2727
1. Explore the organisms, databases/gene sets and namespaces supported natively
2828
2. Run ORA over pathway databases and explore the interactive HTML output
29-
3. Run GSEA over the GO 'non-redundant' and full database and compare the results
29+
3. Run GSEA over the `WebGestalt` `GO noRedundant` and full database and compare the results
3030

3131
<p>&nbsp;</p> <!-- insert blank line -->
3232

@@ -51,5 +51,4 @@ You could also open the file by selecting `File` &rarr; `Open file`, or use the
5151
- We have reviewed the organisms and databases that are natively supported by this easy to use tool
5252
- We have run both ORA and GSEA and explored the interactive HTML results summary
5353
- We have touched on the redundancy filters available within this tool, for GO as well as two external algorithms applied automatically to any enrichment performed
54-
- We have learnt about an R package that can create compatability between `WebGestaltR` (and other FEA tools) with `enichplot` for visualisation options
5554
- In the next session, we will use `WebGestaltR` with novel species

14-novel-species-FEA.Rmd

Lines changed: 54 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,18 @@ Despite the lack of quality resources, there is much 'omics work conducted in ax
3030

3131
Today we will use [public RNAseq data](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5419050/#SD7) from axolotl, comparing gene expression in the blastema after proximal (at the shoulder) and distal (at the hand) limb amputation. The blastema is a collection of undifferentiated progenitor cells that give rise to the regenerated limb. Maybe our functional enrichment analysis of differentially expressed genes can help us understand processes that cause the blastema to grow into a full limb or just a hand!
3232

33+
## Caveat!
34+
35+
This is not a real experiment! It was tricky to find a novel species that had
36+
37+
a) a reference genome
38+
b) a GTF
39+
c) was not natively supported by any FEA tool
40+
d) had publicly available RNAseq data
41+
e) FEA described in a peer-reviewed study
42+
43+
This axolotl data ticked A through D yet not E (RNA from various tissues were sequenced to aid a genome assembly project rather than for a valid biological experiment). Please keep this in mind when reviewing the results of the FEA we perform 😉 The goal of the session is not *really* to uncover how a blastema differentiates into a hand or an arm, but to demonstrate to you how you can apply this method to your own novel species.
44+
3345
<p>&nbsp;</p> <!-- insert blank line -->
3446

3547
### Raw data sources
@@ -130,7 +142,7 @@ The `WebGestaltR` tab delimited `description` file with `.des` suffix has these
130142

131143
**Example .des format:**
132144

133-
![webgr-gmt](images/webgr-des-file.png)
145+
<img src="images/webgr-des-file.png" style="border: none; box-shadow: none; background: none; ">
134146

135147
<p>&nbsp;</p> <!-- insert blank line -->
136148

@@ -142,6 +154,27 @@ The `.gmt` file is provided to the parameter `enrichDatabaseFile` and the `.des`
142154

143155
<p>&nbsp;</p> <!-- insert blank line -->
144156

157+
158+
### Mapping database terms to descriptions
159+
160+
Annotating a proteome with a tool such as `emapper` provides a connection between your novel species gene IDs and term IDs. To add the term description, we need a database file. In this analysis, we will use the [GO 'core' ontology file](https://purl.obolibrary.org/obo/go.obo) and the [KEGG Pathways file](https://www.pathway.jp/en/academic.html). These files were downloaded to the `workshop` folder of the VMs during our setup session.
161+
162+
The GO `go.obo` is a text file (>600K lines) with details for all terms in GO at the time of download. The second line of the file contains the GO database version, in this case: `data-version: releases/2024-06-17`.
163+
164+
**go.obo term information is structured like this:**
165+
166+
<img src="images/go-obo-example.png" style="border: none; box-shadow: none; background: none; width: 100%;">
167+
168+
169+
The KEGG Pathways file is identical in format to what is required for both the `clusterProfiler` `TERM2NAME` and `WebGestaltR` `.des` files:
170+
171+
**KEGG Patwhays file format:**
172+
173+
<img src="images/webgr-des-file.png" style="border: none; box-shadow: none; background: none; ">
174+
175+
<p>&nbsp;</p> <!-- insert blank line -->
176+
177+
145178
## RNotebook novel species FEA
146179

147180
&#x27A4; Return to RStudio and open the notebook `novel_species.Rmd`.
@@ -160,7 +193,7 @@ We will now take a quick look at novel species FEA online with `STRING`.
160193

161194
<p>&nbsp;</p> <!-- insert blank line -->
162195

163-
## STRING novel species FEA
196+
## STRING (web) novel species FEA
164197

165198
The axolotl putative proteome was previosuly uploaded to STRING and custom annotation performed. This completed using the STRING servers, with compute time less than one day.
166199

@@ -201,31 +234,33 @@ Note that the `Organism` field is pre-filled with `STRG0A90SNX (axolotl)`.
201234

202235
<p>&nbsp;</p> <!-- insert blank line -->
203236

204-
&#x27A4; In the RStudio `Files` pane, locate your saved ORA gene list from earlier - `workshop/Axolotl_DEGs.txt`. Click the file to view it in RStudio, then copy paste the list into the STRING `List of Names` search field, then select `SEARCH`.
237+
&#x27A4; In the RStudio `Files` pane, locate your saved ORA gene list from earlier - `workshop/Axolotl_DEGs.txt`. Click the file to view it in RStudio, then copy paste the list into the STRING `List of Names` search field, then select `SEARCH`
238+
239+
&#x27A4; Click `CONTINUE` at the gene ID review page
205240

206-
- Note that there is no option at this query page (even under `Advanced Settings`) to provide a custom background gene list. This can be done after the initial search has been run.
241+
Before we explore the results, note that we have performed ORA without a background gene list! 😮
207242

208-
- Click `CONTINUE` at the gene ID review page
243+
There is no option at the query page (even under `Advanced Settings`) to provide a custom background gene list. This must be done *after* the initial search has been run. Hopefully this will change in future versions.
209244

210-
- Then scroll ALL the way to the bottom of the page to the subheading `Statistical background`
211-
- If you did not have a relevant saved background gene list, you would select `Add background`
212-
- In this case, there is a saved background list for this annotation called `axolotl-blastema-background` (**FRED is this showing for you?**), so you can select the background from the dropdown menu then click `UPDATE`. This will re-cmpute the enrichment.
245+
**To add a custom background to STRING ORA:**
213246

247+
In order to perform this part, you need a `STRING` login. If you have one, feel free to follow these instructions, otherwise, view along.
214248

215-
<img src="images/string-add-bg.png" style="border: none; box-shadow: none; background: none; width: 100%;">
249+
1. Click on the `ANALYSIS` tab
250+
2. Scroll ALL the way to the bottom of the page to the subheading `Statistical background`
251+
- If you do not have a relevant saved background gene list in your `STRING` saved datasets, you would select `Add background`. See next sub-heading for details
252+
- If you do have a relevant background saved, change `Whole Genome` to select the relevant gene background list from your saved datasets
253+
3. Once you have selected the background, click `UPDATE` and the ORA will be re-computed using the custom statistical background
216254

217255
<p>&nbsp;</p> <!-- insert blank line -->
218256

219257
<img src="images/string-update-bg.png" style="border: none; box-shadow: none; background: none; width: 100%;">
220258

221-
<p>&nbsp;</p> <!-- insert blank line -->
222-
223-
**Note**: you may need to login! If you do not have a `STRING` login, you can watch along
224259

225260
<p>&nbsp;</p> <!-- insert blank line -->
226261

227-
**If you need to add a custom background:**
228-
- Select `ADD BACKGROUND`
262+
**Load axolotl background gene list to your STRING profile**
263+
- Under `ANALYSIS` tab of ORA results page, at `Statistical background`, select `ADD BACKGROUND`
229264
- You will be prompted to login if you are not already
230265
- Under `1) name your new set` provide a descriptive name for your background gene list. I chose `axolotl-blastema-background`
231266
- At `2) identify your organism` this will be pre-filled with our custom axolotl annotation
@@ -238,17 +273,18 @@ Note that the `Organism` field is pre-filled with `STRG0A90SNX (axolotl)`.
238273

239274
The gene list will be mapped to the STRING database. This may take several minutes.
240275

276+
<p>&nbsp;</p> <!-- insert blank line -->
277+
241278
`STRING` saves your custom datasets under `My Data`:
242279

243280
<img src="images/string-set-novel-bg.png" style="border: none; box-shadow: none; background: none; width: 100%;">
244281

245-
Despite this, there is not (yet) an option to run the initial analysis with a pre-saved background gene list! Hopefully this feature will be added in the future.
246-
247282

248283
<p>&nbsp;</p> <!-- insert blank line -->
249284

250285
**<span style="color: green;">Now let's explore the results!</span>**
251286

287+
252288
Some suggestions:
253289

254290
- Select a node on the network, and then `Show this node's terms in the analysis table` to highlight the terms the gene was present in
@@ -260,7 +296,7 @@ Some suggestions:
260296

261297
### How do the STRING results compare to those we generated in R?
262298

263-
We expect a large difference in the results because of the differing proteome annotation methods.
299+
**We expect a large difference in the results because of the differing proteome annotation methods - both the annotation tool and the databases that were annotated against.**
264300

265301
The `eggNOG emapper` annotations we used in `R` rely on orthology-based predictions, employing extensive similarity searches to map genes to their closest evolutionary counterparts across diverse species. This results in a more comprehensive catalog of functional terms, even if they are inferred rather than directly evidenced.
266302

@@ -281,6 +317,6 @@ And a clear lack of overlap in number of enriched terms and term IDs between `ST
281317

282318
<img src="images/string-novel-ora-compare.png" style="border: none; box-shadow: none; background: none; ">
283319

284-
These GO terms from `STRING` may be parent terms of more specific child terms prevalent in the R output. For a real world analysis, it would be optimal to compare, and deduce whether both methods could provide valuable and complimentary insights, or whether the results from one annotation approach or the other were more suited to your novel species.
320+
These GO terms from `STRING` may be parent terms of more specific child terms prevalent in the `R` output. For a real world analysis, it would be optimal to compare, and deduce whether both methods could provide valuable and complimentary insights, or whether the results from one annotation approach or the other were more suited to your novel species.
285321

286322
Whichever you choose, strength to you! This is not an easy space to work in 💪

day2_Rnotebooks/WebGestaltR.Rmd

Lines changed: 21 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,8 @@ library(enrichplot)
1515

1616
# 0. Working directory
1717

18+
Ensure the 'workshop' directory is your current working directory:
19+
1820
```{r check notebook workdir}
1921
getwd()
2022
```
@@ -216,7 +218,7 @@ Some things to note:
216218

217219
- You can increase the default view of 20 rows to 'All' for the enrichment table, but this does not necessarily show all significant enrichments! Check the output file `enrichment_results_ORA_pathways.txt` and you can see 85 significant terms, yet 'All' view with default of 20 rows shows 30 something. To increase the number of rows included in the HTML report, use the parameter `reportNum`
218220

219-
- You can run algorithms to reduce the number of terms through clustering, in order to make the results more manageable. This is discussed in the WebGestalt 2019 update publication [Liao et al 2019](https://academic.oup.com/nar/article/47/W1/W199/5494758). The authors maintain that "important biological themes are all covered with these selected gene sets"
221+
- You can run algorithms to reduce the number of terms through clustering, in order to make the results more manageable. This is discussed in the WebGestalt 2019 update publication [Liao et al 2019](https://academic.oup.com/nar/article/47/W1/W199/5494758). The authors maintain that "important biological themes are all covered with these selected gene sets". Built-in redundancy handling/term clustering is a feature of `WebGestaltR` (and the web version). To what extent this is appropriate for the database you are using is up to you to determine. For example, in the next analysis we will perform GSEA over the `noRedundant` GO MF database. Applying a double layer of redundancy filters over a database seems quite dubious to me..
220222

221223
- Selecting a term from the 'Enrichment Results' table updates the term under 'Select an enriched analyte set', where more detailed results are shown, including the genes from your gene list present within the gene set for that term
222224

@@ -250,6 +252,16 @@ Let's run enrichment over the full and the non-redundant version of the GO MF da
250252

251253
Let's use GSEA since we have already tried ORA with this package. GSEA is slower and GO is large, so even with 7 threads these commands will take a few minutes (longer for redundant than non-redundant, of course). Feel free to use the compute time to ask questions on slack or explore the ORA pathways output some more!
252254

255+
There is no `seed` parameter for `WebGestaltR` GSEA as there is for `clusterProfiler`. We can set it in R instead with `set.seed()`.
256+
257+
```{r set seed}
258+
set.seed(123)
259+
```
260+
261+
A note from testing: without setting the seed in R, a slightly different number of enriched terms were returned over 3 runs. With setting the seed, the same number and IDs of terms were significant among the replicate runs, BUT the NES and the FDR were slightly different! The unadjusted ES and P values were the same.
262+
263+
264+
253265
```{r GSEA GO MF with redundant}
254266
255267
outputDirectory <- "WebGestaltR_results"
@@ -279,7 +291,6 @@ suppressWarnings({ gomf <- WebGestaltR(
279291

280292

281293
```{r GSEA GO MF nonredundant}
282-
283294
outputDirectory <- "WebGestaltR_results"
284295
project <- "GSEA_GO-MF_non-redundant"
285296
database <- "geneontology_Molecular_Function_noRedundant"
@@ -367,36 +378,11 @@ By grouping so many similar terms with the non-redundant analyses, the overall n
367378
For your own research, you could explore the relationships between these terms by viewing the neighborhood of GO terms on AmiGO: https://amigo.geneontology.org/amigo, or using NaviGO https://kiharalab.org/navigo/views/goset.php
368379

369380

381+
# 5. Save versions and session details
370382

371-
# 5 Plot `WebGestaltR` results with `enichplot`
372-
373-
A key advantage of R is flexibility with data manipulation and visualisation. In the previous exercise, we have explored the many plot options of `enrichplot`. This package was designed to work with an `enrichResult` object from `clusterProfler` and a handful of other packages written by the same team. The desire to use `enrichplot` with the output of other tools is widespread.
374-
375-
The R package `multienrichjam` has a function `enrichDF2enrichResult` that converts dataframe type results from other FEA tools to the format required for `enrichplot`!
376-
377-
`multienrichjam` has a lot of dependencies and has not been installed on these VMs so we will not be performing this today. However, this functionality and flexibility is pretty cool, so if you wanted to install this on your own computer outside the workshop, below is the code for installing :-)
378-
379-
380-
```{r install multienrichjam}
381-
# github remotes package required:
382-
#library(remotes)
383+
## Database query dates
383384

384-
# install multienrichjam:
385-
remotes::install_github("jmw86069/multienrichjam", dependencies=TRUE)
386-
387-
# load library:
388-
#library(multienrichjam)
389-
390-
# check function help menu:
391-
#?enrichDF2enrichResult
392-
```
393-
394-
395-
396-
397-
# 6. Save versions and session details
398-
399-
Unlike `gprofiler`, `WebGestaltR` does not have a function to list the version of the queried database.
385+
Unlike `gprofiler`, `WebGestaltR` does not have a function to list the version of the queried databases.
400386

401387
For this reason, we will save the analysis date to our rendered notebook, so the external database version could be back-calculated from the date if required:
402388

@@ -406,16 +392,17 @@ print(Sys.Date())
406392
407393
```
408394

409-
410-
R version and package versions:
395+
## R version and R package versions
411396

412397
```{r info }
413398
sessionInfo()
414399
```
415400

416401

417402

418-
And RStudio version. Typically, we would simply run `RStudio.Version()` to print the version details. However, when we knit this document to HTML, the `RStudio.Version()` function is not available and will cause an error. So to make sure our version details are saved to our static record of the work, we will save to a file, then print the file contents back into the notebook.
403+
## RStudio version
404+
405+
Typically, we would simply run `RStudio.Version()` to print the version details. However, when we knit this document to HTML, the `RStudio.Version()` function is not available and will cause an error. So to make sure our version details are saved to our static record of the work, we will save to a file, then print the file contents back into the notebook.
419406

420407

421408
```{r rstudio version - not run during knit, eval=FALSE}
@@ -450,7 +437,7 @@ rstudio_version_text
450437

451438

452439

453-
# 7. Knit workbook to HTML
440+
# 6. Knit workbook to HTML
454441

455442
The last task is to knit the notebook. Our notebook is editable, and can be changed. Deleting code deletes the output, so we could lose valuable details. If we knit the notebook to HTML, we have a permanent static copy of the work.
456443

0 commit comments

Comments
 (0)