You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: 06-web-tools.Rmd
+77-3Lines changed: 77 additions & 3 deletions
Original file line number
Diff line number
Diff line change
@@ -49,7 +49,7 @@ gProfiler is known for its integration of numerous species and databases. It sup
49
49
The Gene Ontology (GO) context is divided into three main categories: Biological Process (BP), Molecular Function (MF), and Cellular Component (CC). The analysis identifies which GO terms are significantly enriched, offering insights into the broader biological implications of the gene set. This helps in pinpointing processes such as cellular responses, metabolic pathways, and molecular interactions.
50
50
51
51
-**Query Info**:
52
-
This section includes specifics about the input data, including the total number of queried genes and any identifiers not recognized or mapped. It also details the statistical background used, the chosen organism, and other analysis settings, ensuring transparency and reproducibility of the results.
52
+
This section includes specifics about the input data, including the total number of queried genes and any identifiers not recognised or mapped. It also details the statistical background used, the chosen organism, and other analysis settings, ensuring transparency and reproducibility of the results.
53
53
54
54
55
55
#### {-}
@@ -154,7 +154,7 @@ The first number indicates how many proteins in your network are annotated with
154
154
Log10(observed / expected). This measure describes how large the enrichment effect is. It’s the ratio between i) the number of proteins in your network that are annotated with a term and ii) the number of proteins that we expect to be annotated with this term in a random network of the same size.
155
155
156
156
<spanstyle="color:orange;">- Signal:</span>
157
-
The signal is defined as a weighted harmonic mean between the observed/expected ratio and -log(FDR). FDR tends to emphasize larger terms due to their potential for achieving lower p-values, while the observed/expected ratio highlights smaller terms, which have a high foreground to background ratio but cannot achieve low FDR values due to their size. The signal measure seeks to balance both metrics for a more intuittive ordering of enriched terms.
157
+
The signal is defined as a weighted harmonic mean between the observed/expected ratio and -log(FDR). FDR tends to emphasise larger terms due to their potential for achieving lower p-values, while the observed/expected ratio highlights smaller terms, which have a high foreground to background ratio but cannot achieve low FDR values due to their size. The signal measure seeks to balance both metrics for a more intuittive ordering of enriched terms.
This measure describes how significant the enrichment is. Shown are p-values corrected for multiple testing within each category using the Benjamini–Hochberg procedure.
@@ -491,5 +491,79 @@ When running FEA in Reactome, how do you prefer the analysis methods?
491
491
#### {-}
492
492
493
493
494
-
<!--## Evaluation Metrics for FEA Methods -->
494
+
## Uncertainties of a functional enrichment analsysis
495
495
496
+
This section summarises the [Wünsch et al., 2023](https://wires.onlinelibrary.wiley.com/doi/full/10.1002/wics.1643) paper, which addresses uncertainties in atypical functional enrichment analysis.
497
+
498
+
```{r, echo=FALSE, out.width="100%", fig.align = "center", fig.cap="From RNA sequencing measurements to the final results: A practical guide to navigating the choices and uncertainties of gene set analysis"}
Functional enrichment analysis (FEA) typically involves one of over representation analysis (ORA), gene set enrichment analysis (GSEA) also known as functional class scoring (FCS), and Pathway Topology (PT).
506
+
507
+
1. ORA
508
+
509
+
\- ORA methods are the least complex among the three approaches of FEA.
510
+
511
+
\- ORA methods requires a list of differentially expressed genes that are already analysed in differential expression analysis.
512
+
513
+
\- The background population, the universe, can be a more general set of gene like those in human genome or more specific from thos observed in an experiment.
514
+
515
+
\- A contingency table is created and the null distribution is modeled using the hypergeometric distribution.
516
+
517
+
2. FCS
518
+
519
+
\- FCS methods aim to aggregate the values of the gene-level statistics (ranks) into gene set-level statistic (enrichment score, ES).
520
+
521
+
\- FCS can be classified as one of FCS I, those that take the expression data as input or FCS II that take a pre-ranked list of genes as input. With the latter, the information of the conditions (phenotypes) of the samples is lost, as such phenotype permutation cannot be performed leaving the choice of null hypothesis to gene set permutation.
522
+
523
+
3. PT
524
+
525
+
\- PT additionally models interactions between the genes. This approach generally scores considerably lower in terms of popularity in the reference database.
0 commit comments