You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: 06-web-tools.Rmd
+139-8Lines changed: 139 additions & 8 deletions
Original file line number
Diff line number
Diff line change
@@ -37,7 +37,7 @@ gProfiler is known for its integration of numerous species and databases. It sup
37
37
38
38
<spanstyle="color:orange;">- Run Query:</span> Run the analysis and review the enriched terms, pathways, and visual outputs. Download the results as needed for further exploration.
39
39
40
-
#### Browse the Results
40
+
#### Browse the gProfiler Results
41
41
42
42
-**Overview**:
43
43
The analysis provided a comprehensive list of enriched terms across selected databases, highlighting significant GO. The results give a high-level summary of pathways or terms most relevant to the input data.
@@ -64,7 +64,7 @@ Use 'All known genes' in one analysis and 'Custom' background in another. Downlo
64
64
65
65
Which background would you use in your analysis?
66
66
67
-
How is multi-query provided to gProfiler?
67
+
How is multi-query support implemented in gProfiler?
68
68
69
69
How can one perform Under Representation Analysis in gProfiler?
70
70
@@ -111,9 +111,9 @@ NOTE: In cases where long list of features is provided, STRING may chnage some o
111
111
- previews of protein structures are not shown
112
112
- the network edges show interaction confidence only
113
113
114
-
### Browse the Results
114
+
### Browse the STRING Results
115
115
116
-
STRING come up with a number of tabs as the outputs.
116
+
STRING generates multiple tabs as output, shown here:
117
117
118
118
{ width=100% }
119
119
@@ -177,11 +177,142 @@ The `Clusters` tab essentially provides three different types of clustering algo
177
177
178
178
Clusters can be downloaded in `.tsv` format.
179
179
180
+
#### **Question** {- .rationale}
181
+
What was the overlap in enrichment terms between gProfiler and STRING at FDR ≤ 0.05?
180
182
181
-
## Reactome
182
-
Reactome is an open-source database of curated biological pathways across species, offering pathway maps and enrichment tools to analyze gene lists in a pathway-focused context. It’s ideal for visualising data within established biochemical and cellular processes.
183
+
<!-- ## FEA in [GenePattern](https://www.genepattern.org/#gsc.tab=0) -->
184
+
## FEA in GenePattern <ahref="https://www.genepattern.org/#gsc.tab=0"target="_blank"><imgsrc="images/GenePattern-logo.png"alt="g:Profiler Logo"style="height:35px; vertical-align:middle;"></a>
185
+
GenePattern, an online platform developed by the Broad Institute, offers a suite of tools for analyzing and visualizing genomic data, making bioinformatics accessible to researchers through a user-friendly, no-programming interface. Among its supported tools is Gene Set Enrichment Analysis (GSEA), which implements [MSigDB GSEA](https://www.gsea-msigdb.org/gsea/index.jsp) analysis for identifying enriched gene sets in genomic data.
186
+
187
+
MSigDB (Molecular Signatures Database) is a collection of gene sets for Gene Set Enrichment Analysis, representing pathways and gene signatures linked to biological states or diseases. It helps identify enriched gene sets, aiding the analysis of gene expression changes and key pathways in experimental data.
188
+
189
+
### Steps to Locate GSEA Module in GenePattern:
190
+
191
+
- Click on the Run button and then the Public Server
192
+
193
+

194
+
195
+
- Sign in to GenePattern or Enter as Guest
196
+
197
+
- Under `Modules` tab hit `Browse Modules`
198
+
199
+
- Find gsea in the Browse Modules by Category page and hit GSEA
\- Create both `.gct` and `.cls` files following [this scrit in R](degust.html)
209
+
210
+
\- Load the `.gct` input file in the `expression dataset` tab and `.cls` file in the `phenotype labels` tab
211
+
212
+
\- Select a `.gmt` file (Gene Matrix Transposed) from the `gene sets database` tab
213
+
214
+
\- Set permutation under `number of permutations` tab
215
+
216
+
\- Type of the permuattaion to be set under `permutation type` tab
217
+
218
+
\- Select an appropriate DNA Chip annotation file from `chip platform file` tab
219
+
220
+
\- Name the output file in `output file name` tab
221
+
222
+
2. Advanced Parameters
223
+
224
+
\- Scoring Scheme:
225
+
226
+
- K-S: The score increment is the same for all genes in *S* regardless of their ranking or correlation strength.
227
+
228
+
- WEighted: the score increment for each gene in *S* is weighted by its correlation with the phenotype, typically the absolute value of the correlation or ranking metric.
229
+
230
+
\- Metric for ranking genes: Ranking metric of interest can be chosen from drop down menu. A detailed description od the metrics is given on [GSEA-MSigDB Documentation](https://docs.gsea-msigdb.org/#GSEA/GSEA_User_Guide/#metrics-for-ranking-genes).
231
+
232
+
- Categorical Phenotypes: Signal-to-Noise Ratio, t-Test, Ratio of Classes, Log2 Ratio of Classes
MSigDB (Molecular Signatures Database) is a collection of gene sets for Gene Set Enrichment Analysis (GSEA), representing pathways and gene signatures linked to biological states or diseases. It helps identify enriched gene sets, aiding the analysis of gene expression changes and key pathways in experimental data.
236
+
\- Minimum and Maximum size of gene sets can be set using `max gene set size` and `min gene set size` tabs
Of the most important files is the `.zip` file that was earlier specified under `output file name` tab in Basic parameters section which includes all the results. The results can also be navigated using the single files listed under the job id.
249
+
250
+
For Pezzini experiment, two `html` files generated for each of up- and down-regulated gene sets, something like:
251
+
252
+
- gsea_report_for_Diff_1731388275794.html
253
+
254
+
- gsea_report_for_Nodiff_1731388275794.html
255
+
256
+
The tabulated versions of the results are given in `.tsv` format:
257
+
258
+
- gsea_report_for_Diff_1731388275794.tsv
259
+
260
+
- gsea_report_for_Nodiff_1731388275794.tsv
261
+
262
+
263
+
<!-- ```{css, echo=FALSE} -->
264
+
<!-- table { -->
265
+
<!-- width: 75%; -->
266
+
<!-- margin: auto; -->
267
+
<!-- border-collapse: collapse; -->
268
+
<!-- } -->
269
+
270
+
<!-- th, td { -->
271
+
<!-- padding: 5px; -->
272
+
<!-- text-align: left; -->
273
+
<!-- border: 1px solid #ddd; -->
274
+
<!-- } -->
275
+
<!-- ``` -->
276
+
277
+
The GSEA result tables have the following header and below is given details of one gene set:
278
+
279
+
```{r, echo=FALSE, message=FALSE}
280
+
data <- data.frame(
281
+
Parameter = c("GS (follow link to MSigDB)", "GS DETAILS", "SIZE", "ES", "NES", "NOM p-val", "FDR q-val", "FWER p-val", "RANK AT MAX", "LEADING EDGE"),
kable(data, caption = "Summary of GSEA Results for REACTOME_FRS_MEDIATED_FGFR2_SIGNALING Gene Set")
298
+
```
299
+
300
+
The leading edge column has three values:
301
+
302
+
- tags: 38% of the genes in the gene set are key to the enrichment result.
303
+
- list: These genes make up 7% of the total gene list being analyzed.
304
+
- signal: They contribute 40% of the enrichment signal, highlighting their importance in driving the association between this gene set and the biological phenotype being studied.
305
+
306
+
307
+
#### **Challenge:** How different ranking metrics impact the output? {- .challenge}
308
+
309
+
Run GSEA analysis using Hallmark gene sets with two metrics (SignaltoNoise and tTest). How do these differ in reporting enriched terms?
310
+
311
+
#### **Question ** {- .rationale}
312
+
313
+
Which gene set category (or categories) offers the most valuable insights for a cell differentiation experiment?
314
+
315
+
316
+
## Reactome
317
+
Reactome is an open-source database of curated biological pathways across species, offering pathway maps and enrichment tools to analyse gene lists in a pathway-focused context. It’s ideal for visualising data within established biochemical and cellular processes.
0 commit comments