Skip to content

Commit c4b9a70

Browse files
committed
tidying up
1 parent f4bfee1 commit c4b9a70

File tree

6 files changed

+69
-71
lines changed

6 files changed

+69
-71
lines changed

09-day2-introduction.Rmd

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,14 @@
1-
# Making sense of gene and protein lists with functional enrichment analysis: Day 2
1+
# Introduction
22

33
## Day 1 recap
44

5-
- We explored a handful of web-based analysis tools and databases for FEA
6-
- We explored the impact of gene set size and background list on the ORA statistical test
7-
- We explored the impact of the gene list filtering and the background gene list ('statistical domain scope') on ORA results
8-
- We explored the impact of ranking metric when performing GSEA
9-
- We discussed the impact of database choice on FEA results
5+
- We explored several web-based tools and databases for performing Functional Enrichment Analysis (FEA)
6+
- Two types of FEA, Over-Representation Analysis (ORA) and Gene Set Enrichment Analysis (GSEA), were introduced and their differences discussed
7+
- We examined how filtering the input gene list and selecting the background gene list (the "statistical domain scope") influence ORA results
8+
- We explored how the choice of ranking metric affects GSEA outcomes
9+
- The impact of database selection on FEA results was evaluated
10+
- Differences in outputs between various tools were compared and discussed
11+
- Uncertainties, implications, and recommendations were highlighted to address challenges in gene set analysis
1012

1113
<p>&nbsp;</p> <!-- insert blank line -->
1214

11-gprofiler2.Rmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# ORA with gprofiler2
1+
# gprofiler2
22

33
[gprofiler2](https://cran.r-project.org/web/packages/gprofiler2/index.html) is the R interface to the `g:Profiler` web-based toolset that you used in day 1 of the workshop.
44

12-clusterprofiler.Rmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# GSEA with clusterProfiler and visualisation with enrichplot
1+
# clusterProfiler and enrichplot
22

33

44
[clusterProfiler](https://bioconductor.org/packages/release/bioc/html/clusterProfiler.html) is a comprehensive suite of enrichment tools. It has functions to run ORA or GSEA over commonly used databases (GO, KEGG, KEGG Modules, DAVID, Pathway Commons, WikiPathways) as well as universal enrichment functions to perform ORA or GSEA with custom gene sets. We will use these universal tools in the final activity of this workshop, focusing on the supported organisms and datbases for the present activity.

14-novel-species-FEA.Rmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Novel species functional enrichment analysis
1+
# Novel species FEA
22

33
FEA can be easily performed for many non-model species with user friendly web tools or R packages. [g:Profiler](https://biit.cs.ut.ee/gprofiler/gost) web currently supports 984 species, and [STRING](https://string-db.org/) currently supports over 12 thousand species.
44

15-presenting-reproducible-FEA-methods.Rmd

Lines changed: 0 additions & 62 deletions
This file was deleted.

15-workshop-wrapup.Rmd

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
# Workshop wrap-up
2+
3+
Over the last two days, as well as the webinar in October, we've explored the statistical background, key considerations, and practical implementation of functional enrichment analysis, with hands-on experience with multiple web-based and R tools.
4+
5+
## Summary of key messages
6+
7+
**<span style="color: green;">ORA and GSEA are different statistical analyses, and their inputs differ</span>**
8+
GSEA: Kolmogorov-Smirnov test, requires a ranked yet unfiltered gene list
9+
ORA: Hypergeometric or Fisher’s Exact test, requires a filtered unranked gene list and experimental background gene list
10+
11+
**<span style="color: green;">Always correct for multiple testing</span>**
12+
Never use unadjusted P values, as this will introduce many false positives. Different tools offer different multiple testing correction such as FDR or the more stringent BH. Always report your chosen method and the significance threshold applied to terms.
13+
14+
**<span style="color: green;">Different analysis methods will return different results</span>**
15+
This is expected, due to underlying differences in database, algorithm, P value methods etc. As long as your methods are robust, sensible and reproducible, you can have confidence that your methods will stand up to scrutiny under peer review.
16+
17+
**<span style="color: green;">Ensure reproducibility</span>**
18+
Lack of reproducibility through under-reporting methods is a common issue in this field (see Wijesooriya et al, linked below). Ensure to record all methodological details while you are working, including all the parameters and arguments applied, how the gene lists were generated, versions of databases and tools etc. If using R, specify a seed for constant random number generation in GSEA.
19+
20+
**<span style="color: green;">Interpret your results in their biological context</span>**
21+
Functional categories are often broad and redundant. Use the FEA results as a guide, not the end point. Use visualisations and explore term redundancy methods to help focus results. Validate through aditional means according to the nature of your experiment, with the gold standard being wet-lab rather than *in silico* validation methods.
22+
23+
**<span style="color: green;">There are many databases and tool choices available</span>**
24+
Suitability to your experiment depends on many factors, including:
25+
- Your species, and what tools support it
26+
- What databases and gene sets are relevant to your experiment, from the general (eg GO) to the specific (eg cancer pathways)
27+
- Any privacy restrictions imposed on your data
28+
- What is your skill level in R or desire to implement R code
29+
- How much flexibility you want or require with visualisation
30+
31+
32+
## Interesting papers for further reading
33+
34+
*Informative and instructional*
35+
36+
37+
- [Interpreting omics data with pathway enrichment analysis](https://www.sciencedirect.com/science/article/abs/pii/S0168952523000185)
38+
39+
*Things to watch out for*
40+
41+
- [Urgent need for consistent standards in functional enrichment analysis](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1009935)
42+
43+
- [Systematic assessment of pathway databases, based on a diverse collection of user-submitted experiments](https://academic.oup.com/bib/article/23/5/bbac355/6695266)
44+
45+
- [Multiple sources of bias confound functional enrichment analysis of global -omics data](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0761-7)
46+
47+
- [The Impact of Pathway Database Choice on Statistical Enrichment Analysis and Predictive Modeling](https://pmc.ncbi.nlm.nih.gov/articles/PMC6883970/#:~:text=Pathway%2Dcentric%20approaches%20are%20widely,the%20context%20of%20precision%20medicine)
48+
49+
50+
- [Two subtle problems with overrepresentation analysis](https://academic.oup.com/bioinformaticsadvances/article/4/1/vbae159/7829164?login=false)
51+
52+
## Audience poll
53+
54+
There is of course no correct answer here, we are just interested to hear your thoughts!
55+
56+
57+
58+
**<span style="color: green;">"If you were to run FEA on your own data tomorrow, would you choose web or R tools?"</span>** :thinking:

0 commit comments

Comments
 (0)