Skip to content

Commit 22f2b7e

Browse files
committed
GSEA added
1 parent 0cb5b2d commit 22f2b7e

File tree

10 files changed

+1365
-68
lines changed

10 files changed

+1365
-68
lines changed

.DS_Store

0 Bytes
Binary file not shown.

06-web-tools.Rmd

Lines changed: 139 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ gProfiler is known for its integration of numerous species and databases. It sup
3737

3838
<span style="color:orange;">- Run Query:</span> Run the analysis and review the enriched terms, pathways, and visual outputs. Download the results as needed for further exploration.
3939

40-
#### Browse the Results
40+
#### Browse the gProfiler Results
4141

4242
- **Overview**:
4343
The analysis provided a comprehensive list of enriched terms across selected databases, highlighting significant GO. The results give a high-level summary of pathways or terms most relevant to the input data.
@@ -64,7 +64,7 @@ Use 'All known genes' in one analysis and 'Custom' background in another. Downlo
6464

6565
Which background would you use in your analysis?
6666

67-
How is multi-query provided to gProfiler?
67+
How is multi-query support implemented in gProfiler?
6868

6969
How can one perform Under Representation Analysis in gProfiler?
7070

@@ -111,9 +111,9 @@ NOTE: In cases where long list of features is provided, STRING may chnage some o
111111
- previews of protein structures are not shown
112112
- the network edges show interaction confidence only
113113

114-
### Browse the Results
114+
### Browse the STRING Results
115115

116-
STRING come up with a number of tabs as the outputs.
116+
STRING generates multiple tabs as output, shown here:
117117

118118
![](images/string-results-tabs.png){ width=100% }
119119

@@ -177,11 +177,142 @@ The `Clusters` tab essentially provides three different types of clustering algo
177177

178178
Clusters can be downloaded in `.tsv` format.
179179

180+
#### **Question** {- .rationale}
181+
What was the overlap in enrichment terms between gProfiler and STRING at FDR ≤ 0.05?
180182

181-
## Reactome
182-
Reactome is an open-source database of curated biological pathways across species, offering pathway maps and enrichment tools to analyze gene lists in a pathway-focused context. It’s ideal for visualising data within established biochemical and cellular processes.
183+
<!-- ## FEA in [GenePattern](https://www.genepattern.org/#gsc.tab=0) -->
184+
## FEA in GenePattern <a href="https://www.genepattern.org/#gsc.tab=0" target="_blank"><img src="images/GenePattern-logo.png" alt="g:Profiler Logo" style="height:35px; vertical-align:middle;"></a>
185+
GenePattern, an online platform developed by the Broad Institute, offers a suite of tools for analyzing and visualizing genomic data, making bioinformatics accessible to researchers through a user-friendly, no-programming interface. Among its supported tools is Gene Set Enrichment Analysis (GSEA), which implements [MSigDB GSEA](https://www.gsea-msigdb.org/gsea/index.jsp) analysis for identifying enriched gene sets in genomic data.
186+
187+
MSigDB (Molecular Signatures Database) is a collection of gene sets for Gene Set Enrichment Analysis, representing pathways and gene signatures linked to biological states or diseases. It helps identify enriched gene sets, aiding the analysis of gene expression changes and key pathways in experimental data.
188+
189+
### Steps to Locate GSEA Module in GenePattern:
190+
191+
- Click on the Run button and then the Public Server
192+
193+
![](images/GenePattern-Run.png)
194+
195+
- Sign in to GenePattern or Enter as Guest
196+
197+
- Under `Modules` tab hit `Browse Modules`
198+
199+
- Find gsea in the Browse Modules by Category page and hit GSEA
200+
201+
![](images/Browse_Modules_gsea.png){ width=100% }
202+
203+
### Steps to Perform GSEA:
204+
<!-- https://cloud.genepattern.org/gp/pages/index.jsf?jobid=613752&openVisualizers=true&openNewWindow=false -->
205+
206+
1. Basic Parameters
207+
208+
\- Create both `.gct` and `.cls` files following [this scrit in R](degust.html)
209+
210+
\- Load the `.gct` input file in the `expression dataset` tab and `.cls` file in the `phenotype labels` tab
211+
212+
\- Select a `.gmt` file (Gene Matrix Transposed) from the `gene sets database` tab
213+
214+
\- Set permutation under `number of permutations` tab
215+
216+
\- Type of the permuattaion to be set under `permutation type` tab
217+
218+
\- Select an appropriate DNA Chip annotation file from `chip platform file` tab
219+
220+
\- Name the output file in `output file name` tab
221+
222+
2. Advanced Parameters
223+
224+
\- Scoring Scheme:
225+
226+
- K-S: The score increment is the same for all genes in *S* regardless of their ranking or correlation strength.
227+
228+
- WEighted: the score increment for each gene in *S* is weighted by its correlation with the phenotype, typically the absolute value of the correlation or ranking metric.
229+
230+
\- Metric for ranking genes: Ranking metric of interest can be chosen from drop down menu. A detailed description od the metrics is given on [GSEA-MSigDB Documentation](https://docs.gsea-msigdb.org/#GSEA/GSEA_User_Guide/#metrics-for-ranking-genes).
231+
232+
- Categorical Phenotypes: Signal-to-Noise Ratio, t-Test, Ratio of Classes, Log2 Ratio of Classes
233+
234+
- Continuous Phenotypes: Pearson Correlation, Spearman Correlation
183235

184-
## MSigDB GSEA
185-
MSigDB (Molecular Signatures Database) is a collection of gene sets for Gene Set Enrichment Analysis (GSEA), representing pathways and gene signatures linked to biological states or diseases. It helps identify enriched gene sets, aiding the analysis of gene expression changes and key pathways in experimental data.
236+
\- Minimum and Maximum size of gene sets can be set using `max gene set size` and `min gene set size` tabs
237+
<!-- <GeneSetName> <Description> <Gene1> <Gene2> <Gene3> ... -->
186238

239+
#### Browse the GSEA results
240+
241+
Once the job has been queued and successfully run, the output will be listd on the left panel under `Jobs` tab:
242+
243+
```{r, echo=FALSE, fig.align = "center", fig.cap="Job status in GenePattern"}
244+
# out.width="50%",
245+
knitr::include_graphics("images/GenePattern-Jobs.png")
246+
```
247+
248+
Of the most important files is the `.zip` file that was earlier specified under `output file name` tab in Basic parameters section which includes all the results. The results can also be navigated using the single files listed under the job id.
249+
250+
For Pezzini experiment, two `html` files generated for each of up- and down-regulated gene sets, something like:
251+
252+
- gsea_report_for_Diff_1731388275794.html
253+
254+
- gsea_report_for_Nodiff_1731388275794.html
255+
256+
The tabulated versions of the results are given in `.tsv` format:
257+
258+
- gsea_report_for_Diff_1731388275794.tsv
259+
260+
- gsea_report_for_Nodiff_1731388275794.tsv
261+
262+
263+
<!-- ```{css, echo=FALSE} -->
264+
<!-- table { -->
265+
<!-- width: 75%; -->
266+
<!-- margin: auto; -->
267+
<!-- border-collapse: collapse; -->
268+
<!-- } -->
269+
270+
<!-- th, td { -->
271+
<!-- padding: 5px; -->
272+
<!-- text-align: left; -->
273+
<!-- border: 1px solid #ddd; -->
274+
<!-- } -->
275+
<!-- ``` -->
276+
277+
The GSEA result tables have the following header and below is given details of one gene set:
278+
279+
```{r, echo=FALSE, message=FALSE}
280+
data <- data.frame(
281+
Parameter = c("GS (follow link to MSigDB)", "GS DETAILS", "SIZE", "ES", "NES", "NOM p-val", "FDR q-val", "FWER p-val", "RANK AT MAX", "LEADING EDGE"),
282+
Value = c(
283+
"[REACTOME_FRS_MEDIATED_FGFR2_SIGNALING](https://www.gsea-msigdb.org/gsea/msigdb/human/geneset/REACTOME_FRS_MEDIATED_FGFR2_SIGNALING)",
284+
"Details ...", # Link formatted in Markdown
285+
"16",
286+
"0.83905387",
287+
"1.7128055",
288+
"0",
289+
"0.03902518",
290+
"0.648",
291+
"995",
292+
"tags=38%, list=7%, signal=40%"
293+
)
294+
)
295+
296+
# Display the data as a table
297+
kable(data, caption = "Summary of GSEA Results for REACTOME_FRS_MEDIATED_FGFR2_SIGNALING Gene Set")
298+
```
299+
300+
The leading edge column has three values:
301+
302+
- tags: 38% of the genes in the gene set are key to the enrichment result.
303+
- list: These genes make up 7% of the total gene list being analyzed.
304+
- signal: They contribute 40% of the enrichment signal, highlighting their importance in driving the association between this gene set and the biological phenotype being studied.
305+
306+
307+
#### **Challenge:** How different ranking metrics impact the output? {- .challenge}
308+
309+
Run GSEA analysis using Hallmark gene sets with two metrics (SignaltoNoise and tTest). How do these differ in reporting enriched terms?
310+
311+
#### **Question ** {- .rationale}
312+
313+
Which gene set category (or categories) offers the most valuable insights for a cell differentiation experiment?
314+
315+
316+
## Reactome
317+
Reactome is an open-source database of curated biological pathways across species, offering pathway maps and enrichment tools to analyse gene lists in a pathway-focused context. It’s ideal for visualising data within established biochemical and cellular processes.
187318

degust.html

Lines changed: 1215 additions & 60 deletions
Large diffs are not rendered by default.

images/Browse_Modules_gsea.png

152 KB
Loading

images/GSEA-logo.gif

8.49 KB
Loading

images/GenePattern-Jobs.png

14.8 KB
Loading

images/GenePattern-Run.png

79.6 KB
Loading

images/GenePattern-logo.png

9.08 KB
Loading

images/MSigDB_GSEA_GUI.png

1.17 MB
Loading

style.css

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,14 @@
1+
/*
2+
table {
3+
max-width: 500px;
4+
border-spacing: 0px;
5+
width: auto;
6+
}
7+
8+
th, td {
9+
padding: 2px 5px;
10+
}
11+
*/
112

213
/* Target links inside paragraphs and lists (i.e., typical body text links) */
314
p a, li a {

0 commit comments

Comments
 (0)