Skip to content

Commit fe6d48f

Browse files
author
lper0012
committed
Merge branch 'main' of github.com:MonashBioinformaticsPlatform/Functional_Enrichment_BioCommons_2024 into main
2 parents 1f2883d + 7d8c2d3 commit fe6d48f

File tree

7 files changed

+46
-14
lines changed

7 files changed

+46
-14
lines changed

.DS_Store

0 Bytes
Binary file not shown.

05-genelists.Rmd

Lines changed: 4 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,8 @@
11
# Defining the genelist
22

3-
4-
53
Starting from those differential expression results [here](http://degust.erc.monash.edu/degust/compare.html?code=5b2c7805ab8f8c5f2dc8c72e61b049b0#?plot=mds), how do we go about getting a genelist to calculate enrichment on?
64

75

8-
9-
106
## Activities
117

128
Todays exercise follows the process of getting the differentially expressed gene list using excel. You could use another spreadsheet program, or some may prefer a programming language like R .
@@ -16,18 +12,17 @@ Todays exercise follows the process of getting the differentially expressed gene
1612

1713
2. How many genes are differentially expressed? In these results the FDR Column contains the corrected p-value, and the 'differentiated' column shows the log2 fold-change of differentiated cells vs untreated cells (log2(diff)-log2(undiff)); 0 is unchanged, 1 is doubled, -1 is halved.
1814

19-
- Significant at 0.01?
15+
- Significant at 0.01?
16+
2017
- That's a particularly large number of genes - perhaps not unexpected given how much the cells are changed this experiment. How many significant genes also have 2-fold change in expression?
2118

22-
- For this workshop, get the genes with a FDR <1x10^-4 and 2x fold change. Note that this is a ridiculous threshold - most experiments yeild far less differential expression, but the difference between these two cell conditions is pretty extreme! Typically you would only filter at p<0.01 (and occasionally 2-fold change) - you might see 10s to 100s of results. However, this arbitrary threshold gives a more typical number of differentially expressed genes for downstream analysis. An alternative approach could be to take the top 500 genes.
19+
- For this workshop, get the genes with a FDR<0.01 and 2x fold change (`log2(4)`). Note - most experiments yeild far less differential expression, but the difference between these two cell conditions is pretty extreme! Typically you would only filter at p<0.01 (and occasionally 2-fold change) - you might see 10s to 100s of results. However, this arbitrary threshold gives a more typical number of differentially expressed genes for downstream analysis. An alternative approach could be to take the top 500 genes.
2320

2421
<details>
2522
<summary>Show</summary>
26-
There are 4923 differentially expressed genes, 2149 of which have a 2-fold change in expression. With the aggressive filtering, there are 198 genes left.
23+
There are 4923 differentially expressed genes, 2149 of which have a 2-fold change in expression. With the aggressive filtering, there are 792 genes left.
2724
</details>
2825

29-
30-
3126
3. How many genes are _tested_? This is your background.
3227

3328
<details>
@@ -37,7 +32,6 @@ There are 4923 differentially expressed genes, 2149 of which have a 2-fold chang
3732

3833
<!--But with ~20k human genes - why are there genes missing? **14420** -->
3934

40-
4135
---
4236

4337
## Common gotcha
@@ -48,7 +42,6 @@ You can't revert the gene names automatically (try converting it to text!). You
4842

4943
<!--NB: You can ignore these for this workshop, but you want this to be right for publication!-->
5044

51-
5245
---
5346

5447
## Example

06-web-tools.Rmd

Lines changed: 42 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -114,7 +114,7 @@ NOTE: In cases where long list of features is provided, STRING may chnage some o
114114
- previews of protein structures are not shown
115115
- the network edges show interaction confidence only
116116

117-
### Browse the STRING Results
117+
### Browse the STRING ORA Results
118118

119119
STRING generates multiple tabs as output, shown here:
120120

@@ -198,8 +198,47 @@ Clusters can be downloaded in `.tsv` format.
198198
#### **Question** {- .rationale}
199199
What was the overlap in enrichment terms between gProfiler and STRING at FDR ≤ 0.05?
200200

201-
<!-- ### Steps to Perform GSEA in STRING: -->
202-
<!-- https://version-12-0.string-db.org/cgi/globalenrichment?networkId=bKhJ4fXp6sna -->
201+
### Steps to Perform GSEA in STRING:
202+
203+
<span style="color:orange;">- Selcet Proteins with Values/Ranks.</span>
204+
205+
<span style="color:orange;">- Input Gene List:</span> Paste your gene list with a meaningful value for ranking (fold-change, log-pvalue, abundance, ...) directly into the input box on the STRING web page or upload a file containing your list of features and their corresponding values.
206+
207+
<span style="color:orange;">- Select Organism:</span> Same as above.
208+
209+
<span style="color:orange;">- Advanced Setting:</span> FDR stringency and the initial sort order can be set up in advance and hit the Search.
210+
211+
### Browse the STRING GSEA Results
212+
213+
The output differs from ORA. For each gene set, the results include the enrichment score, its direction within the ranked list, the number of overlapping features with the gene set, and the associated FDR.
214+
215+
When a user selects a gene set from the enriched table,
216+
217+
```{r, echo=FALSE, out.width="100%", fig.align = "center", fig.cap="An example table of WikiPathway gene sets"}
218+
knitr::include_graphics("images/string-gsea-wikiPathways.png")
219+
```
220+
the associated genes are displayed within the ranking list. A table showing these genes along with their original ranking values is also provided.
221+
222+
```{r, echo=FALSE, fig.align = "center", fig.cap="List of genes in the term (WP197) and their positions on the ranked list"}
223+
knitr::include_graphics("images/string-gsea-ranking-n-table.png")
224+
```
225+
226+
Additionally, the locations of the corresponding proteins are highlighted in the proteome network:
227+
228+
```{r, echo=FALSE, fig.align = "center", fig.cap="Proteome network"}
229+
knitr::include_graphics("images/string-gsea-proteome-network.png")
230+
```
231+
232+
A Functional enrichment visualisation (similar to that of ORA) is provided at below the enriched tables.
233+
234+
Modify `Enrichment display settings` tab before downloading the enriched tables. It is recommended to merge terms with a certain level of similarity to reduce redundancy, especially if there are many overlapping terms.
235+
236+
```{r, echo=FALSE, fig.align = "center", fig.cap="Enrichment display settings"}
237+
knitr::include_graphics("images/string-enrichement-display-settings.png")
238+
```
239+
240+
Here is an example output of [GSEA on STRING](https://version-12-0.string-db.org/cgi/globalenrichment?networkId=bKhJ4fXp6sna) from a previous run (the link will expire in future).
241+
203242

204243
<!-- ## FEA in [GenePattern](https://www.genepattern.org/#gsc.tab=0) -->
205244
## FEA in GenePattern <a href="https://www.genepattern.org/#gsc.tab=0" target="_blank"><img src="images/GenePattern-logo.png" alt="GenePattern Logo" style="height:35px; vertical-align:middle;"></a>
29 KB
Loading
356 KB
Loading
82.2 KB
Loading
466 KB
Loading

0 commit comments

Comments
 (0)