You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: 13-webgestaltr.Rmd
+1-2Lines changed: 1 addition & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -26,7 +26,7 @@ This tool (both the web and R versions) has many features and advantages:
26
26
27
27
1. Explore the organisms, databases/gene sets and namespaces supported natively
28
28
2. Run ORA over pathway databases and explore the interactive HTML output
29
-
3. Run GSEA over the GO 'non-redundant' and full database and compare the results
29
+
3. Run GSEA over the `WebGestalt``GO noRedundant` and full database and compare the results
30
30
31
31
<p> </p> <!-- insert blank line -->
32
32
@@ -51,5 +51,4 @@ You could also open the file by selecting `File` → `Open file`, or use the
51
51
- We have reviewed the organisms and databases that are natively supported by this easy to use tool
52
52
- We have run both ORA and GSEA and explored the interactive HTML results summary
53
53
- We have touched on the redundancy filters available within this tool, for GO as well as two external algorithms applied automatically to any enrichment performed
54
-
- We have learnt about an R package that can create compatability between `WebGestaltR` (and other FEA tools) with `enichplot` for visualisation options
55
54
- In the next session, we will use `WebGestaltR` with novel species
Copy file name to clipboardExpand all lines: 14-novel-species-FEA.Rmd
+54-18Lines changed: 54 additions & 18 deletions
Original file line number
Diff line number
Diff line change
@@ -30,6 +30,18 @@ Despite the lack of quality resources, there is much 'omics work conducted in ax
30
30
31
31
Today we will use [public RNAseq data](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5419050/#SD7) from axolotl, comparing gene expression in the blastema after proximal (at the shoulder) and distal (at the hand) limb amputation. The blastema is a collection of undifferentiated progenitor cells that give rise to the regenerated limb. Maybe our functional enrichment analysis of differentially expressed genes can help us understand processes that cause the blastema to grow into a full limb or just a hand!
32
32
33
+
## Caveat!
34
+
35
+
This is not a real experiment! It was tricky to find a novel species that had
36
+
37
+
a) a reference genome
38
+
b) a GTF
39
+
c) was not natively supported by any FEA tool
40
+
d) had publicly available RNAseq data
41
+
e) FEA described in a peer-reviewed study
42
+
43
+
This axolotl data ticked A through D yet not E (RNA from various tissues were sequenced to aid a genome assembly project rather than for a valid biological experiment). Please keep this in mind when reviewing the results of the FEA we perform 😉 The goal of the session is not *really* to uncover how a blastema differentiates into a hand or an arm, but to demonstrate to you how you can apply this method to your own novel species.
44
+
33
45
<p> </p> <!-- insert blank line -->
34
46
35
47
### Raw data sources
@@ -130,7 +142,7 @@ The `WebGestaltR` tab delimited `description` file with `.des` suffix has these
@@ -142,6 +154,27 @@ The `.gmt` file is provided to the parameter `enrichDatabaseFile` and the `.des`
142
154
143
155
<p> </p> <!-- insert blank line -->
144
156
157
+
158
+
### Mapping database terms to descriptions
159
+
160
+
Annotating a proteome with a tool such as `emapper` provides a connection between your novel species gene IDs and term IDs. To add the term description, we need a database file. In this analysis, we will use the [GO 'core' ontology file](https://purl.obolibrary.org/obo/go.obo) and the [KEGG Pathways file](https://www.pathway.jp/en/academic.html). These files were downloaded to the `workshop` folder of the VMs during our setup session.
161
+
162
+
The GO `go.obo` is a text file (>600K lines) with details for all terms in GO at the time of download. The second line of the file contains the GO database version, in this case: `data-version: releases/2024-06-17`.
163
+
164
+
**go.obo term information is structured like this:**
➤ Return to RStudio and open the notebook `novel_species.Rmd`.
@@ -160,7 +193,7 @@ We will now take a quick look at novel species FEA online with `STRING`.
160
193
161
194
<p> </p> <!-- insert blank line -->
162
195
163
-
## STRING novel species FEA
196
+
## STRING (web) novel species FEA
164
197
165
198
The axolotl putative proteome was previosuly uploaded to STRING and custom annotation performed. This completed using the STRING servers, with compute time less than one day.
166
199
@@ -201,31 +234,33 @@ Note that the `Organism` field is pre-filled with `STRG0A90SNX (axolotl)`.
201
234
202
235
<p> </p> <!-- insert blank line -->
203
236
204
-
➤ In the RStudio `Files` pane, locate your saved ORA gene list from earlier - `workshop/Axolotl_DEGs.txt`. Click the file to view it in RStudio, then copy paste the list into the STRING `List of Names` search field, then select `SEARCH`.
237
+
➤ In the RStudio `Files` pane, locate your saved ORA gene list from earlier - `workshop/Axolotl_DEGs.txt`. Click the file to view it in RStudio, then copy paste the list into the STRING `List of Names` search field, then select `SEARCH`
238
+
239
+
➤ Click `CONTINUE` at the gene ID review page
205
240
206
-
- Note that there is no option at this query page (even under `Advanced Settings`) to provide a custom background gene list. This can be done after the initial search has been run.
241
+
Before we explore the results, note that we have performed ORA without a background gene list! 😮
207
242
208
-
- Click `CONTINUE`at the gene ID review page
243
+
There is no option at the query page (even under `Advanced Settings`) to provide a custom background gene list. This must be done *after* the initial search has been run. Hopefully this will change in future versions.
209
244
210
-
- Then scroll ALL the way to the bottom of the page to the subheading `Statistical background`
211
-
- If you did not have a relevant saved background gene list, you would select `Add background`
212
-
- In this case, there is a saved background list for this annotation called `axolotl-blastema-background` (**FRED is this showing for you?**), so you can select the background from the dropdown menu then click `UPDATE`. This will re-cmpute the enrichment.
245
+
**To add a custom background to STRING ORA:**
213
246
247
+
In order to perform this part, you need a `STRING` login. If you have one, feel free to follow these instructions, otherwise, view along.
2. Scroll ALL the way to the bottom of the page to the subheading `Statistical background`
251
+
- If you do not have a relevant saved background gene list in your `STRING` saved datasets, you would select `Add background`. See next sub-heading for details
252
+
- If you do have a relevant background saved, change `Whole Genome` to select the relevant gene background list from your saved datasets
253
+
3. Once you have selected the background, click `UPDATE` and the ORA will be re-computed using the custom statistical background
Despite this, there is not (yet) an option to run the initial analysis with a pre-saved background gene list! Hopefully this feature will be added in the future.
246
-
247
282
248
283
<p> </p> <!-- insert blank line -->
249
284
250
285
**<spanstyle="color: green;">Now let's explore the results!</span>**
251
286
287
+
252
288
Some suggestions:
253
289
254
290
- Select a node on the network, and then `Show this node's terms in the analysis table` to highlight the terms the gene was present in
@@ -260,7 +296,7 @@ Some suggestions:
260
296
261
297
### How do the STRING results compare to those we generated in R?
262
298
263
-
We expect a large difference in the results because of the differing proteome annotation methods.
299
+
**We expect a large difference in the results because of the differing proteome annotation methods - both the annotation tool and the databases that were annotated against.**
264
300
265
301
The `eggNOG emapper` annotations we used in `R` rely on orthology-based predictions, employing extensive similarity searches to map genes to their closest evolutionary counterparts across diverse species. This results in a more comprehensive catalog of functional terms, even if they are inferred rather than directly evidenced.
266
302
@@ -281,6 +317,6 @@ And a clear lack of overlap in number of enriched terms and term IDs between `ST
These GO terms from `STRING` may be parent terms of more specific child terms prevalent in the R output. For a real world analysis, it would be optimal to compare, and deduce whether both methods could provide valuable and complimentary insights, or whether the results from one annotation approach or the other were more suited to your novel species.
320
+
These GO terms from `STRING` may be parent terms of more specific child terms prevalent in the `R` output. For a real world analysis, it would be optimal to compare, and deduce whether both methods could provide valuable and complimentary insights, or whether the results from one annotation approach or the other were more suited to your novel species.
285
321
286
322
Whichever you choose, strength to you! This is not an easy space to work in 💪
Copy file name to clipboardExpand all lines: day2_Rnotebooks/WebGestaltR.Rmd
+21-34Lines changed: 21 additions & 34 deletions
Original file line number
Diff line number
Diff line change
@@ -15,6 +15,8 @@ library(enrichplot)
15
15
16
16
# 0. Working directory
17
17
18
+
Ensure the 'workshop' directory is your current working directory:
19
+
18
20
```{r check notebook workdir}
19
21
getwd()
20
22
```
@@ -216,7 +218,7 @@ Some things to note:
216
218
217
219
- You can increase the default view of 20 rows to 'All' for the enrichment table, but this does not necessarily show all significant enrichments! Check the output file `enrichment_results_ORA_pathways.txt` and you can see 85 significant terms, yet 'All' view with default of 20 rows shows 30 something. To increase the number of rows included in the HTML report, use the parameter `reportNum`
218
220
219
-
- You can run algorithms to reduce the number of terms through clustering, in order to make the results more manageable. This is discussed in the WebGestalt 2019 update publication [Liao et al 2019](https://academic.oup.com/nar/article/47/W1/W199/5494758). The authors maintain that "important biological themes are all covered with these selected gene sets"
221
+
- You can run algorithms to reduce the number of terms through clustering, in order to make the results more manageable. This is discussed in the WebGestalt 2019 update publication [Liao et al 2019](https://academic.oup.com/nar/article/47/W1/W199/5494758). The authors maintain that "important biological themes are all covered with these selected gene sets". Built-in redundancy handling/term clustering is a feature of `WebGestaltR` (and the web version). To what extent this is appropriate for the database you are using is up to you to determine. For example, in the next analysis we will perform GSEA over the `noRedundant` GO MF database. Applying a double layer of redundancy filters over a database seems quite dubious to me..
220
222
221
223
- Selecting a term from the 'Enrichment Results' table updates the term under 'Select an enriched analyte set', where more detailed results are shown, including the genes from your gene list present within the gene set for that term
222
224
@@ -250,6 +252,16 @@ Let's run enrichment over the full and the non-redundant version of the GO MF da
250
252
251
253
Let's use GSEA since we have already tried ORA with this package. GSEA is slower and GO is large, so even with 7 threads these commands will take a few minutes (longer for redundant than non-redundant, of course). Feel free to use the compute time to ask questions on slack or explore the ORA pathways output some more!
252
254
255
+
There is no `seed` parameter for `WebGestaltR` GSEA as there is for `clusterProfiler`. We can set it in R instead with `set.seed()`.
256
+
257
+
```{r set seed}
258
+
set.seed(123)
259
+
```
260
+
261
+
A note from testing: without setting the seed in R, a slightly different number of enriched terms were returned over 3 runs. With setting the seed, the same number and IDs of terms were significant among the replicate runs, BUT the NES and the FDR were slightly different! The unadjusted ES and P values were the same.
@@ -367,36 +378,11 @@ By grouping so many similar terms with the non-redundant analyses, the overall n
367
378
For your own research, you could explore the relationships between these terms by viewing the neighborhood of GO terms on AmiGO: https://amigo.geneontology.org/amigo, or using NaviGO https://kiharalab.org/navigo/views/goset.php
368
379
369
380
381
+
# 5. Save versions and session details
370
382
371
-
# 5 Plot `WebGestaltR` results with `enichplot`
372
-
373
-
A key advantage of R is flexibility with data manipulation and visualisation. In the previous exercise, we have explored the many plot options of `enrichplot`. This package was designed to work with an `enrichResult` object from `clusterProfler` and a handful of other packages written by the same team. The desire to use `enrichplot` with the output of other tools is widespread.
374
-
375
-
The R package `multienrichjam` has a function `enrichDF2enrichResult` that converts dataframe type results from other FEA tools to the format required for `enrichplot`!
376
-
377
-
`multienrichjam` has a lot of dependencies and has not been installed on these VMs so we will not be performing this today. However, this functionality and flexibility is pretty cool, so if you wanted to install this on your own computer outside the workshop, below is the code for installing :-)
Unlike `gprofiler`, `WebGestaltR` does not have a function to list the version of the queried database.
385
+
Unlike `gprofiler`, `WebGestaltR` does not have a function to list the version of the queried databases.
400
386
401
387
For this reason, we will save the analysis date to our rendered notebook, so the external database version could be back-calculated from the date if required:
402
388
@@ -406,16 +392,17 @@ print(Sys.Date())
406
392
407
393
```
408
394
409
-
410
-
R version and package versions:
395
+
## R version and R package versions
411
396
412
397
```{r info }
413
398
sessionInfo()
414
399
```
415
400
416
401
417
402
418
-
And RStudio version. Typically, we would simply run `RStudio.Version()` to print the version details. However, when we knit this document to HTML, the `RStudio.Version()` function is not available and will cause an error. So to make sure our version details are saved to our static record of the work, we will save to a file, then print the file contents back into the notebook.
403
+
## RStudio version
404
+
405
+
Typically, we would simply run `RStudio.Version()` to print the version details. However, when we knit this document to HTML, the `RStudio.Version()` function is not available and will cause an error. So to make sure our version details are saved to our static record of the work, we will save to a file, then print the file contents back into the notebook.
419
406
420
407
421
408
```{r rstudio version - not run during knit, eval=FALSE}
@@ -450,7 +437,7 @@ rstudio_version_text
450
437
451
438
452
439
453
-
# 7. Knit workbook to HTML
440
+
# 6. Knit workbook to HTML
454
441
455
442
The last task is to knit the notebook. Our notebook is editable, and can be changed. Deleting code deletes the output, so we could lose valuable details. If we knit the notebook to HTML, we have a permanent static copy of the work.
0 commit comments