MonashBioinformaticsPlatform
diff --git a/‎.DS_Store‎
0 Bytes b/‎.DS_Store‎
0 Bytes
diff --git a/‎05-genelists.Rmd‎
Lines changed: 4 additions & 11 deletions b/‎05-genelists.Rmd‎
Lines changed: 4 additions & 11 deletions
diff --git a/‎06-web-tools.Rmd‎
Lines changed: 42 additions & 3 deletions b/‎06-web-tools.Rmd‎
Lines changed: 42 additions & 3 deletions
diff --git a/‎images/string-enrichement-display-settings.png‎
29 KB b/‎images/string-enrichement-display-settings.png‎
29 KB
diff --git a/‎images/string-gsea-proteome-network.png‎
356 KB b/‎images/string-gsea-proteome-network.png‎
356 KB
diff --git a/‎images/string-gsea-ranking-n-table.png‎
82.2 KB b/‎images/string-gsea-ranking-n-table.png‎
82.2 KB
diff --git a/‎images/string-gsea-wikiPathways.png‎
466 KB b/‎images/string-gsea-wikiPathways.png‎
466 KB
@@ -1,12 +1,8 @@
 # Defining the genelist
 
-
-
 Starting from those differential expression results [here](http://degust.erc.monash.edu/degust/compare.html?code=5b2c7805ab8f8c5f2dc8c72e61b049b0#?plot=mds), how do we go about getting a genelist to calculate enrichment on? 
 
 
-
-
 ## Activities
 
 Todays exercise follows the process of getting the differentially expressed gene list using excel. You could use another spreadsheet program, or some may prefer a programming language like R .
@@ -16,18 +12,17 @@ Todays exercise follows the process of getting the differentially expressed gene
 
 2. How many genes are differentially expressed? In these results the FDR Column contains the corrected p-value, and the 'differentiated' column shows the log2 fold-change of differentiated cells vs untreated cells (log2(diff)-log2(undiff)); 0 is unchanged, 1 is doubled, -1 is halved.
 
-    - Significant at 0.01? 
+    - Significant at 0.01?
+    
     - That's a particularly large number of genes - perhaps not unexpected given how much the cells are changed this experiment. How many significant genes also have 2-fold change in expression?
 
-    - For this workshop, get the genes with a FDR <1x10^-4 and 2x fold change. Note that this is a ridiculous threshold - most experiments yeild far less differential expression, but the difference between these two cell conditions is pretty extreme! Typically you would only filter at p<0.01 (and occasionally 2-fold change) - you might see 10s to 100s of results. However, this arbitrary threshold gives a more typical number of differentially expressed genes for downstream analysis. An alternative approach could be to take the top 500 genes.
+    - For this workshop, get the genes with a FDR<0.01 and 2x fold change (`log2(4)`). Note - most experiments yeild far less differential expression, but the difference between these two cell conditions is pretty extreme! Typically you would only filter at p<0.01 (and occasionally 2-fold change) - you might see 10s to 100s of results. However, this arbitrary threshold gives a more typical number of differentially expressed genes for downstream analysis. An alternative approach could be to take the top 500 genes.
 
 <details>
 <summary>Show</summary>
-There are 4923 differentially expressed genes, 2149 of which have a 2-fold change in expression. With the aggressive filtering, there are 198 genes left.
+There are 4923 differentially expressed genes, 2149 of which have a 2-fold change in expression. With the aggressive filtering, there are 792 genes left.
 </details>
 
-
-
 3. How many genes are _tested_? This is your background.
 
 <details>
@@ -37,7 +32,6 @@ There are 4923 differentially expressed genes, 2149 of which have a 2-fold chang
 
 <!--But with ~20k human genes - why are there genes missing? **14420** --> 
 
-
 ---
 
 ## Common gotcha
@@ -48,7 +42,6 @@ You can't revert the gene names automatically (try converting it to text!). You
 
 <!--NB: You can ignore these for this workshop, but you want this to be right for publication!-->
 
-
 ---
 
 ## Example
 
@@ -114,7 +114,7 @@ NOTE: In cases where long list of features is provided, STRING may chnage some o
  - previews of protein structures are not shown
  - the network edges show interaction confidence only
 
-### Browse the STRING Results
+### Browse the STRING ORA Results
 
 STRING generates multiple tabs as output, shown here:
 
@@ -198,8 +198,47 @@ Clusters can be downloaded in `.tsv` format.
 #### **Question** {- .rationale}
 What was the overlap in enrichment terms between gProfiler and STRING at FDR ≤ 0.05?
 
-<!-- ### Steps to Perform GSEA in STRING: -->
-<!-- https://version-12-0.string-db.org/cgi/globalenrichment?networkId=bKhJ4fXp6sna -->
+### Steps to Perform GSEA in STRING:
+
+<span style="color:orange;">- Selcet Proteins with Values/Ranks.</span>
+
+<span style="color:orange;">- Input Gene List:</span> Paste your gene list with a meaningful value for ranking (fold-change, log-pvalue, abundance, ...) directly into the input box on the STRING web page or upload a file containing your list of features and their corresponding values.
+
+<span style="color:orange;">- Select Organism:</span> Same as above.
+
+<span style="color:orange;">- Advanced Setting:</span> FDR stringency and the initial sort order can be set up in advance and hit the Search.
+
+### Browse the STRING GSEA Results
+
+The output differs from ORA. For each gene set, the results include the enrichment score, its direction within the ranked list, the number of overlapping features with the gene set, and the associated FDR.
+
+When a user selects a gene set from the enriched table,
+
+```{r, echo=FALSE, out.width="100%", fig.align = "center", fig.cap="An example table of WikiPathway gene sets"} 
+knitr::include_graphics("images/string-gsea-wikiPathways.png")
+```
+the associated genes are displayed within the ranking list. A table showing these genes along with their original ranking values is also provided.
+
+```{r, echo=FALSE, fig.align = "center", fig.cap="List of genes in the term (WP197) and their positions on the ranked list"} 
+knitr::include_graphics("images/string-gsea-ranking-n-table.png")
+```
+
+Additionally, the locations of the corresponding proteins are highlighted in the proteome network:
+
+```{r, echo=FALSE, fig.align = "center", fig.cap="Proteome network"} 
+knitr::include_graphics("images/string-gsea-proteome-network.png")
+```
+
+A Functional enrichment visualisation (similar to that of ORA) is provided at below the enriched tables.
+
+Modify `Enrichment display settings` tab before downloading the enriched tables. It is recommended to merge terms with a certain level of similarity to reduce redundancy, especially if there are many overlapping terms.
+
+```{r, echo=FALSE, fig.align = "center", fig.cap="Enrichment display settings"} 
+knitr::include_graphics("images/string-enrichement-display-settings.png")
+```
+
+Here is an example output of [GSEA on STRING](https://version-12-0.string-db.org/cgi/globalenrichment?networkId=bKhJ4fXp6sna) from a previous run (the link will expire in future).
+
 
 <!-- ## FEA in [GenePattern](https://www.genepattern.org/#gsc.tab=0) -->
 ## FEA in GenePattern <a href="https://www.genepattern.org/#gsc.tab=0" target="_blank"><img src="images/GenePattern-logo.png" alt="GenePattern Logo" style="height:35px; vertical-align:middle;"></a>