Skip to content

Commit 9833e69

Browse files
committed
last tweals to novel
1 parent 61b7e77 commit 9833e69

File tree

2 files changed

+9
-11
lines changed

2 files changed

+9
-11
lines changed

14-novel-species-FEA.Rmd

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,7 @@
22

33
FEA can be easily performed for many non-model species with user friendly web tools or R packages. [g:Profiler](https://biit.cs.ut.ee/gprofiler/gost) web currently supports 984 species, and [STRING](https://string-db.org/) currently supports over 12 thousand species.
44

5-
Since many non-model species are supported by some FEA tools (eg g:Profiler > 900, STRING > 12K), today I am using the term **novel species** to describe a species that is not currently supported by any FEA tool.
6-
7-
This activity creates custom database files required for novel species FEA, but it's worth noting that the same methods are also applicable to custom gene set analysis. The only difference is that instead of mapping your species' genes against a known database (eg GO, KEGG), you would map your species genes to the genes in the custom gene set, and use those ID mappings.
5+
Since many non-model species are supported by some FEA tools, today I am using the term **novel species** to describe a species that is not currently supported by any FEA tool.
86

97
Novel species FEA is possible with `clusterProfiler` or `WebGestaltR` in R, or using web tools `WebGestalt` or `STRING`. The requirements for each tool are slightly different, however at minimum a predicted proteome fasta is necessary. If you do not have a predicted proteome for your species, you would need to perform gene prediction, for which there are a number of *in silico* tools available. It must be kept in mind that *in silico* predicted proteomes can vary greatly in quality. Those that include multiple data sources such as polished genome assemblies generated with both short and long read shotgun sequencing and gene prediction that includes RNAseq data are likely to produce better gene predictions than those that are based only on for example short read sequencing.
108

@@ -245,7 +243,7 @@ Note that the `Organism` field is pre-filled with `STRG0A90SNX (axolotl)`.
245243

246244
Before we explore the results, note that we have performed ORA without a background gene list! 😮
247245

248-
There is no option at the query page (even under `Advanced Settings`) to provide a custom background gene list initially. This must be done *after* the initial search has been run. Hopefully this will change in future versions.
246+
There is no option at the query page (even under `Advanced Settings`) to provide a custom background gene list initially. This must be done *after* the initial search has been run. Hopefully this will change in future versions 🫠
249247

250248
❗In order to add or apply a previously saved custom background gene list, you need to be logged in to `STRING`. The upload can take a bit of time, so you do not need to do this now, however the dropdowns below provide instructions for applying a saved background or adding a new one.
251249

@@ -288,7 +286,7 @@ There is no option at the query page (even under `Advanced Settings`) to provide
288286

289287
`STRING` saves your custom datasets under `My Data`:
290288

291-
<img src="images/string-set-novel-bg.png" style="border: none; box-shadow: none; background: none; width: 100%;">
289+
<img src="images/string-my-data.png" style="border: none; box-shadow: none; background: none; width: 100%;">
292290

293291
</details>
294292

@@ -338,10 +336,12 @@ First of all we see a difference in the number of genes annotated to terms:
338336

339337
<p>&nbsp;</p> <!-- insert blank line -->
340338

341-
And a clear lack of overlap in number of enriched terms and term IDs between `STRING` and the `R` tools:
339+
And a clear lack of overlap in number of enriched GO terms and term IDs between `STRING` and the `R` tools:
342340

343341
<img src="images/string-novel-ora-compare.png" style="border: none; box-shadow: none; background: none; ">
344342

345343
These GO terms from `STRING` may be parent terms of more specific child terms prevalent in the `R` output. For a real world analysis, it would be optimal to compare, and deduce whether both methods could provide valuable and complimentary insights, or whether the results from one annotation approach or the other were more suited to your novel species.
346344

347-
Whichever you choose, strength to you! This is not an easy space to work in 💪
345+
Whichever you choose, strength to you! This is not an easy space to work in 💪
346+
347+
Remember the importance of validating your results through other means! 🧪

day2_Rnotebooks/novel_species.Rmd

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,6 @@ library(tidyverse)
1818
library(clusterProfiler)
1919
library(WebGestaltR)
2020
library(enrichplot)
21-
#library(ggupset)
2221
2322
```
2423

@@ -169,6 +168,7 @@ head(degs)
169168
head(background)
170169
171170
```
171+
172172
## 2.4 Save gene lists
173173

174174
Saving any outputs generated from R code is vital to reproducibility! You should include all analysed gene lists within the supplementary materials of your manuscript.
@@ -195,6 +195,7 @@ Check the column names of the `emapper` annotation file so we know which are the
195195
```{r colnames anno}
196196
colnames(eggnog_anno)
197197
```
198+
198199
We need `GOs` and `KEGG_Pathway` columns.
199200

200201
### 3.1.1 GO TERM2GENE
@@ -227,9 +228,6 @@ head(go_term2gene)
227228
228229
```
229230

230-
```{r}
231-
232-
```
233231

234232
### 3.1.2 KEGG TERM2GENE
235233

0 commit comments

Comments
 (0)