You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: 05-genelists.Rmd
+4-11Lines changed: 4 additions & 11 deletions
Original file line number
Diff line number
Diff line change
@@ -1,12 +1,8 @@
1
1
# Defining the genelist
2
2
3
-
4
-
5
3
Starting from those differential expression results [here](http://degust.erc.monash.edu/degust/compare.html?code=5b2c7805ab8f8c5f2dc8c72e61b049b0#?plot=mds), how do we go about getting a genelist to calculate enrichment on?
6
4
7
5
8
-
9
-
10
6
## Activities
11
7
12
8
Todays exercise follows the process of getting the differentially expressed gene list using excel. You could use another spreadsheet program, or some may prefer a programming language like R .
@@ -16,18 +12,17 @@ Todays exercise follows the process of getting the differentially expressed gene
16
12
17
13
2. How many genes are differentially expressed? In these results the FDR Column contains the corrected p-value, and the 'differentiated' column shows the log2 fold-change of differentiated cells vs untreated cells (log2(diff)-log2(undiff)); 0 is unchanged, 1 is doubled, -1 is halved.
18
14
19
-
- Significant at 0.01?
15
+
- Significant at 0.01?
16
+
20
17
- That's a particularly large number of genes - perhaps not unexpected given how much the cells are changed this experiment. How many significant genes also have 2-fold change in expression?
21
18
22
-
- For this workshop, get the genes with a FDR <1x10^-4 and 2x fold change. Note that this is a ridiculous threshold - most experiments yeild far less differential expression, but the difference between these two cell conditions is pretty extreme! Typically you would only filter at p<0.01 (and occasionally 2-fold change) - you might see 10s to 100s of results. However, this arbitrary threshold gives a more typical number of differentially expressed genes for downstream analysis. An alternative approach could be to take the top 500 genes.
19
+
- For this workshop, get the genes with a FDR<0.01 and 2x fold change (`log2(4)`). Note - most experiments yeild far less differential expression, but the difference between these two cell conditions is pretty extreme! Typically you would only filter at p<0.01 (and occasionally 2-fold change) - you might see 10s to 100s of results. However, this arbitrary threshold gives a more typical number of differentially expressed genes for downstream analysis. An alternative approach could be to take the top 500 genes.
23
20
24
21
<details>
25
22
<summary>Show</summary>
26
-
There are 4923 differentially expressed genes, 2149 of which have a 2-fold change in expression. With the aggressive filtering, there are 198 genes left.
23
+
There are 4923 differentially expressed genes, 2149 of which have a 2-fold change in expression. With the aggressive filtering, there are 792 genes left.
27
24
</details>
28
25
29
-
30
-
31
26
3. How many genes are _tested_? This is your background.
32
27
33
28
<details>
@@ -37,7 +32,6 @@ There are 4923 differentially expressed genes, 2149 of which have a 2-fold chang
37
32
38
33
<!--But with ~20k human genes - why are there genes missing? **14420** -->
39
34
40
-
41
35
---
42
36
43
37
## Common gotcha
@@ -48,7 +42,6 @@ You can't revert the gene names automatically (try converting it to text!). You
48
42
49
43
<!--NB: You can ignore these for this workshop, but you want this to be right for publication!-->
<spanstyle="color:orange;">- Selcet Proteins with Values/Ranks.</span>
204
+
205
+
<spanstyle="color:orange;">- Input Gene List:</span> Paste your gene list with a meaningful value for ranking (fold-change, log-pvalue, abundance, ...) directly into the input box on the STRING web page or upload a file containing your list of features and their corresponding values.
206
+
207
+
<spanstyle="color:orange;">- Select Organism:</span> Same as above.
208
+
209
+
<spanstyle="color:orange;">- Advanced Setting:</span> FDR stringency and the initial sort order can be set up in advance and hit the Search.
210
+
211
+
### Browse the STRING GSEA Results
212
+
213
+
The output differs from ORA. For each gene set, the results include the enrichment score, its direction within the ranked list, the number of overlapping features with the gene set, and the associated FDR.
214
+
215
+
When a user selects a gene set from the enriched table,
216
+
217
+
```{r, echo=FALSE, out.width="100%", fig.align = "center", fig.cap="An example table of WikiPathway gene sets"}
A Functional enrichment visualisation (similar to that of ORA) is provided at below the enriched tables.
233
+
234
+
Modify `Enrichment display settings` tab before downloading the enriched tables. It is recommended to merge terms with a certain level of similarity to reduce redundancy, especially if there are many overlapping terms.
Here is an example output of [GSEA on STRING](https://version-12-0.string-db.org/cgi/globalenrichment?networkId=bKhJ4fXp6sna) from a previous run (the link will expire in future).
241
+
203
242
204
243
<!-- ## FEA in [GenePattern](https://www.genepattern.org/#gsc.tab=0) -->
205
244
## FEA in GenePattern <ahref="https://www.genepattern.org/#gsc.tab=0"target="_blank"><imgsrc="images/GenePattern-logo.png"alt="GenePattern Logo"style="height:35px; vertical-align:middle;"></a>
0 commit comments