You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
WebGestaltR supports 12 species directly, however, you can import your own database files to perform ORA and GSEA for novel species :-) We will do that in the next session.
35
34
36
-
The next two commands have a default setting of `organism = "hsapiens"` (hint: check with `?listOrganism` and `?listIdType`).
37
-
38
-
Let's view the list of supported organisms. We don't have to specify any arguments to the function as the defaults do what we need. We do however need the empty brackets. Without them will print the function source code.
35
+
Let's view the list of supported organisms. We don't have to specify any arguments, but we do need the empty brackets. Without them will print the function source code.
39
36
40
37
```{r list organisms }
41
38
listOrganism()
42
39
```
43
40
41
+
The next two commands have a default setting of `organism = "hsapiens"`, so running without any argument will show the genesets (databases) and ID types (namespaces) that are supported for human.
42
+
44
43
View databases for human. Use the black arrow on the right of the table to view the other 2 columns, and use the numbers below the table (or 'next') to view the next 10 rows.
We will use the same Pezzini RNAseq dataset as earlier. Since we have previously saved our ranked list, DEGs and background genes to the `workshop` folder, we could import those. However, clarity of how the gene list inputs were made is retained within the notebook, and this enhances reproducibility. Gene lists are quick and simple to extract from the input data. If the process was slow and compute-intensive, we would instead document the source and methods behind the gene lists in the notebook comments instead of re-creating them.
@@ -100,30 +95,32 @@ Bring up the help menu for the `WebGestaltR` function and spend a few minutes re
100
95
101
96
There are quite a few! For many of them (eg gene set size filters, multiple testing correction method, P value cutoff) the default settings are suitable.
102
97
103
-
In particular, look for the arguments that control:
98
+
In particular, look for the parameters that control:
104
99
105
100
- whether ORA, GSEA or NTA is performed
106
101
107
102
- which database/s to run enrichment on
108
103
109
-
- what is the namespace for the gene list query
104
+
- what is the namespace/gee ID type for the gene list query
110
105
111
106
- how to specify the input gene list/s
112
107
113
-
Hopefully you've discovered that the `WebGestaltR` function can intake either gene lists from files (as long as the right column format and file suffix is provided) or R objects.
108
+
Hopefully you've discovered that the `WebGestaltR` function can intake EITHER gene lists from files (as long as the right column format and file suffix is provided) or R objects.
109
+
110
+
Since we have decided to extract the gene lists from the DE matrix to R objects, we need to provide the gene list object to `interestGene` parameter (and `referenceGene` for ORA background).
114
111
115
-
Since we have decided to extract the gene lists from the DE matrix to R objects, we need to provide the gene list object to `interestGene` parameter (and `referenceGene` for ORA background). For ORA, the gene lists need to be vectors, and for GSEA, a 2-column dataframe (unlike `clusterProfiler`, which requires a GSEA vector).
112
+
For ORA, the gene lists need to be vectors, and for GSEA, a 2-column dataframe (unlike `clusterProfiler`, which requires a GSEA vector).
116
113
117
114
Our input matrix contains ENSEMBL IDs as well as official gene symbols, so we could use "ensembl_gene_id" or "genesymbol" for the parameter `interestGeneType`. Let's extract the ENSEMBL IDs since they are more specific than symbol.
118
115
119
116
120
117
```{r extract ora gene list vectors}
121
-
# Filter genes with adjusted p-value < 0.01 and absolute log2 fold change > 2
118
+
# Filter genes with adjusted p-value < 0.01 and absolute log2 fold change > 2 and saved as 'DEGs' vector
# extract ranked dataframe, saved as 'ranked' object
140
137
ranked <- data %>%
141
138
arrange(desc(Log2FC)) %>%
142
139
dplyr::select(Gene.ID, Log2FC)
@@ -154,13 +151,13 @@ tail(ranked)
154
151
For this task, let's focus on the pathway gene sets. From skimming the output of `listGeneSet()` there were a few. We could manually locate these and copy them in to our list, or take advantage of the fact that the `WebGestaltR` developers have been systematic in the gene set naming, ensuring all database names are prefixed with their type, ie `geneontology_`, `pathway_`, `network_`, plus a few others.
155
152
156
153
```{r select pathway databases}
157
-
# Save the list of databases for human
154
+
# Save the databases for human
158
155
databases <- listGeneSet()
159
156
160
157
# Extract the the pathways from the 'name' column that start with 'pathway'
minNum = 10, # Minimum number of genes in a gene set to include
202
199
maxNum = 500, # Maximum number of genes in a gene set to include
203
-
outputDirectory = outdir,
200
+
outputDirectory = outdir,
204
201
projectName = project,
205
-
nThreads = 6
202
+
nThreads = 6 # use 6 threads for faster run
206
203
)
207
204
```
208
205
209
206
The results are saved within a new folder inside our new folder `WebGestaltR_results/Project_ORA_pathways`. There are a number of results files, the one we will focus on is the interactive HTML summary file.
210
207
208
+
209
+
210
+
STOP: to save time for GSEA compute, skip ahead, run the code chunk labelled `GSEA GO MF with redundant` (it takes several minutes) then return here where we will explore the ORA HTML while the GSEA runs!!!
211
+
212
+
213
+
211
214
In the `Files` pane, open the folder `WebGestaltR_results/Project_ORA_pathways` then click on the `Report_ORA_pathways.html` file. Select `View in Web Browser`.
212
215
213
216
Some things to note:
@@ -323,7 +326,7 @@ Notice in the GSEA code chunks above, the R function `supressWarnings` has been
323
326
324
327
Now that we have both results saved in R objects, we can compare the enriched terms.
Scanning the list of terms only within the full GO MF (including redundant terms) we see many terms to do with DNA molecular functions.
384
+
Scanning the list of terms only within the full GO MF (including redundant terms) we see many terms to do with DNA activity and binding.
375
385
376
386
Significant in the 'non-redundant' analysis, we can see just 2 DNA activity functions: "DNA secondary structure binding" (significant in both) and "single-stranded DNA binding" (unique to GO MF NR).
377
387
378
388
By grouping so many similar terms with the non-redundant analyses, the overall number of enrichments is lower and more targeted, providing a more concise overview of the biology from your results.
379
389
380
-
For your own research, you could explore the relationships between these terms by viewing the neighborhood of GO terms on AmiGO: https://amigo.geneontology.org/amigo, or using NaviGO https://kiharalab.org/navigo/views/goset.php
390
+
For your own research, you could explore the relationships between these terms by viewing the neighborhood of GO terms on AmiGO: https://amigo.geneontology.org/amigo, or using NaviGO https://kiharalab.org/navigo/views/goset.php (enter multiple GO IDs to see their relationships).
381
391
382
392
383
393
# 5. Save versions and session details
@@ -438,6 +448,7 @@ rstudio_version_text
438
448
```
439
449
440
450
451
+
STOP: while this is knitting, we will commence the novel species activity in the online workshop materials.
0 commit comments