Added more details

calizilla · calizilla · commit 753bd5a66368 · 2024-11-14T16:33:35.000+11:00
diff --git a/09-r-environment-setup.Rmd b/09-r-environment-setup.Rmd
@@ -168,7 +168,7 @@ Saving the workspace image saves all objects from the session such as your varia
 
 Now that we have a clear workspace, we will prepare for the first analysis activity by opening the notebook and checking our working directory. 
 
-You have previosuly downloaded an unzipped `Functional_enrichment_workshop_2024`. This contains a folder `day_2`. 
+You have previosuly downloaded `Functional_enrichment_workshop_2024`. This contains a folder `day_2`. 
 
 &#x27A4; On the `Files` pane, open the `day_2` folder and confirm that it contains the input data file `Pezzini_DE.txt`
 
@@ -194,12 +194,14 @@ Scroll down to the code chunk labelled `Load input data`. Note that the filepath
 
 Immediately above the `Load input data` is a code chunk labelled `Load R packages`. This contains all of the R packages required to run the analysis contained within the workbook. Loading all required packages within the notebook, rather than directly via the console, ensures that anyone running your notebook does not encounter errors if they forget to load a required package. 
 
-Note that the packages that are loaded to the session with the R `library` command must first be installed; this has already been been done for you on these VMs. Attempts to load a package that is not installed will meet a fatal error, and installation can then be peformed (not difficult in R) before resuming.
+Note that the packages that are loaded to the session with the R `library` command must first be installed; this has already been been done for you on these VMs. Attempts to load a package that is not installed will produce an error, and installation can then be peformed (not difficult in R) before resuming.
 
-Note that the code chunk label also contains the text `include=FALSE`. This prevents the loading of libraries, which can at times have verbose output, from cluttering up your rendered notebook when it is previewed or knit. 
+Note that the code chunk label also contains the text `include=FALSE`. This prevents the loading of libraries (which can at times have verbose output) from cluttering up your rendered notebook when it is previewed or knit. 
 
 <p>&nbsp;</p>  <!-- insert blank line -->
 
 &#x27A4; Run the `Load R packages` code chunk. 
 
-Please let us know if you have any errors loading the packages. Don't be alarmed that the output is red! :relaxed:
+Please let us know if you have any errors loading the packages :raised_hand:
+
+Don't be alarmed that the output is <span style="color: red;">red</span>! :slightly_smiling_face:
diff --git a/10-gprofiler2.Rmd b/10-gprofiler2.Rmd
@@ -1,16 +1,23 @@
 # ORA with gprofiler2
 
-[gprofiler2](https://cran.r-project.org/web/packages/gprofiler2/index.html) is the R interface to the 'g:Profiler' toolset that you used in day 1 of the workshop.
+[gprofiler2](https://cran.r-project.org/web/packages/gprofiler2/index.html) is the R interface to the `g:Profiler` web-based toolset that you used in day 1 of the workshop.
 
-Like the web interface, gprofiler2 performs ORA with `g:GOSt` against multiple databases simultaneously. 
+Like the web interface, `gprofiler2` performs ORA with `g:GOSt` against multiple databases simultaneously. 
 
 It supports all the same organisms, namespaces and data sources as the web tool. The list of organisms and corresponding data sources is available [here](https://biit.cs.ut.ee/gprofiler/page/organism-list) (n = 984). 
-The full list of namespaces that g:Profiler recognizes is available [here](https://biit.cs.ut.ee/gprofiler/page/namespaces-list). 
+
+The full list of supported namespaces is available [here](https://biit.cs.ut.ee/gprofiler/page/namespaces-list). 
+
+The `gprofiler2` user guide can be found [here](https://cran.r-project.org/web/packages/gprofiler2/gprofiler2.pdf). 
+
+<p>&nbsp;</p>  <!-- insert blank line -->
 
 ## Input data 
 
 Since we are doing ORA, we will need a filtered gene list, and a background gene list. We will continue with the RNAseq dataset from [Pezzini et al 2016](https://link.springer.com/article/10.1007/s10571-016-0403-y) introduced yesterday. 
 
+<p>&nbsp;</p>  <!-- insert blank line -->
+
 ## Activity overview
 
 1. Load input dataset (a gene matrix with adjusted P values and log2 fold change values) 
@@ -19,13 +26,16 @@ Since we are doing ORA, we will need a filtered gene list, and a background gene
 4. Run ORA with `gost` function
 5. Save the tabular results to a file
 6. Visualise the results 
-7. Run a gost multi-query for up-regulated and down-regulated genes
-8. Compare gprofiler2 R results to the g:Profiler web results
+7. Run a `gost` multi-query for up-regulated and down-regulated genes
+8. Compare `gprofiler2` R results to the `g:Profiler` web results
+
+<p>&nbsp;</p>  <!-- insert blank line -->
 
 &#x27A4; Go back to your RStudio interface, where we have opened the `gprofiler2.Rmd` notebook and loaded the required R packages. 
 
-**Instructions and information for the rest of this activity will continue from the notebook.**
+**Instructions for the analysis will continue from the notebook.**
 
+<p>&nbsp;</p>  <!-- insert blank line -->
 
 ## End of activity summary
 
@@ -37,8 +47,10 @@ Since we are doing ORA, we will need a filtered gene list, and a background gene
 
 The last task is to `knit` the notebook. Our notebook is editable, and can be changed. Deleting code deletes the output, so we could lose valuable details. If we knit the notebook to HTML, we have a permanent static copy of the work. 
 
+<p>&nbsp;</p>  <!-- insert blank line -->
+
 &#x27A4; Knit the notebook to HTML
 
-Note that the notebook will only knit if there are no errors in the code. If your knit fails, please ask for assistance resolving the errors. 
+Note that the notebook will only knit if there are no errors in the code. If your knit fails, please ask for assistance resolving the errors :raised_hand: 
 
 
diff --git a/11-clusterprofiler.Rmd b/11-clusterprofiler.Rmd
@@ -1,62 +1,53 @@
 # GSEA with clusterProfiler
 
 
-[clusterProfiler](https://bioconductor.org/packages/release/bioc/html/clusterProfiler.html) is a comprehensive suite of enrichment tools. It has inbuilt functions to run ORA (`enrich<DB>`) or GSEA (`gse<DB>`) over commonly used databases (GO, KEGG, KEGG Modules, DAVID, Pathway Commons, WikiPathways) as well as generic functions to perform ORA (`enricher`) or GSEA (`gsea`) with custom gene sets.
+[clusterProfiler](https://bioconductor.org/packages/release/bioc/html/clusterProfiler.html) is a comprehensive suite of enrichment tools. It has functions to run ORA or GSEA over commonly used databases (GO, KEGG, KEGG Modules, DAVID, Pathway Commons, WikiPathways) as well as generic functions to perform ORA or GSEA with custom gene sets.
 
-It has a companion plotting package `enrichPlot` dedicated to plotting `clusterProfiler` results. 
+It has a companion plotting package [enrichplot](https://www.bioconductor.org/packages/release/bioc/html/enrichplot.html) dedicated to plotting enrichment results. 
 
 The `clusterProfiler` user guide can be found [here](https://bioconductor.org/packages/devel/bioc/manuals/clusterProfiler/man/clusterProfiler.pdf). 
 
 The `enrichplot` user guide can be found [here](https://www.bioconductor.org/packages/devel/bioc/manuals/enrichplot/man/enrichplot.pdf). 
 
+One of the challenges when working with `clusterProfiler` for FEA is that each enrichment function has different supported organisms and different namespace requirements, so you can not necessarily use all of the functions over the same gene list. In this activity, we will review the FEA functions and investigate their requirements, before performing a gene ID conversion with the `bitr` function to enable compatability with our ([Pezzini et al 2016](https://link.springer.com/article/10.1007/s10571-016-0403-y)) dataset. 
 
-## Input data 
-
-We will use the same RNAseq dataset from the previous activity ([Pezzini et al 2016](https://link.springer.com/article/10.1007/s10571-016-0403-y)).
-
+<p>&nbsp;</p>  <!-- insert blank line -->
 
 ## Activity overview
 
-** THINKING TO DROP GSEGO ENTIERLY, AND JUST DO GESKEGG 
-
-PAT A:  A REVIEW OF SUPPORTED DBS, DIFFERENT SUPPORTED ORGANISMS AND NAMESPACES DEPENDING ON DB, AND HOW TO FIND OUT WHAT IS REQUIRED/APPLICABLE PER EACH FUNCTION
-PART B -  GSEKEGG WHICH REQUIRES A GENE ID CONVERSION USIN GBITR, THEN RUN GSEA, THEN SOME PPLOTS
-PART C - INCLUDE HERE OR IN NOTEBOOK 4 - TERM2GENE AND TERM2NAME
-
-REASON: GO IS DONE TO DEATH, AND THE TRICK OF THIS PACKAGE IS THAT EACH FUNCTION HAS ITS OWN LIST OF SUPPORTED SPECIES AND NAMESPACES DEPENDING ON THE DATABASE, THE PDF IS NOT SUPER CLEAR ON WHICH IS WHICH. 
+1. Explore the functions of `clusterProfiler` including which FEA functions support which organisms and which namespaces
+2. Load input dataset (a gene matrix with adjusted P values and log2 fold change values) 
+3. Extract the gene IDs and sort by log2 fold change to create the GSEA gene list R object
+4. Use `bitr` to convert gene IDs from ENSEMBL to ENTREZ for comptability with `gseKEGG`
+5. Perform GSEA with `gseKEGG`
+6. Visualise results with `enrichplot`
 
-WOULD LIKE MORE TIME TO DO THE TERM2GENE ADN TERM2NAME AS MANY APPLICANTS MENTIONED THIS EITHER INDIRECTLY VIA REQUEST FOR NON MODEL OR DIRECTLY. 
+<p>&nbsp;</p>  <!-- insert blank line -->
 
-1. Load input dataset (a gene matrix with adjusted P values and log2 fold change values) 
-2. Extract the gene IDs and sort by log2 fold change to create the GSEA gene list
-3. Use the `gseGO` function to run GSEA over GO MF
-4. Visualise GO results with `enrichplot`
-5. Use `bitr` to convert gene IDs then use the `gseKEGG` function to run GSEA over KEGG
-6. Visualise KEGG results with `enrichplot`
-7. Review  `enricher` and `gsea` generic functions to perform ORA and GSEA
+&#x27A4; Go back to your RStudio interface and clear your environment by selecting `Session` &rarr; `Quit session` &rarr; `Dont save` &rarr;`Start mew session`
 
 
-&#x27A4; Refresh your Rstudio workspace with option 1 or option 2
+<p>&nbsp;</p>  <!-- insert blank line -->
 
-***Option 1: close and re-open RStudio***
+&#x27A4; Open the `clusterProfiler.Rmd` notebook using `File` &rarr; `Open file`, or use the keyboard shortcut `ctrl + o`. 
 
-Close RStudio, and if asked `Save workspace image to ~/R.Data?` select `Don't Save`. Then, re-open RStudio. 
 
-***Option 2: manualy clear environment and history***
+**Instructions for the analysis will continue from the notebook.**
 
-Close the Rmd file, clear the command history by selecting the broom icon in the history pane, then clear all objects from the environment by entering the following R command in the console:
+<p>&nbsp;</p>  <!-- insert blank line -->
 
-```{r}
-rm(list = ls())
-```
-
-&#x27A4; Open the `clusterProfiler.Rmd` notebook in RStudio
-
-**Instructions and information for the rest of this activity will continue from the notebook.**
+## End of activity summary
 
+- We have explored the supported organisms and namespaces of the `clusterProfiler`enrichment functions 
+- We have extracted a ranked gene list for GSEA and converted the gene IDs for compatability with `gseKEGG`
+- We have performed GSEA on the KEGG database with `gseKEGG` and visualised the results with multiple plot types 
+- We have captured all version details relevant to the session within the R notebook
 
-## End of activity summary
+<p>&nbsp;</p>  <!-- insert blank line -->
 
+## Poll
 
+:question: What was your favourite plot? :thinking:
 
+This may be the one you found most informative, easiest to interpret, most eye-catching... 
 
diff --git a/12-webgestaltr.Rmd b/12-webgestaltr.Rmd
@@ -0,0 +1,11 @@
+# WebGestaltR
+
+LIST OF AMAZING THINGS ABOUT WEBGESTALTR:
+- makes great html reports with interactive plots and links to external dbs
+- saves the results to disk when running, no need to export stuf and save files manually 
+- many dbs and gene lists supported (n = 70) 
+- supports metabolomics, with 15 different ID types, see new paper https://academic.oup.com/nar/article/52/W1/W415/7684598#google_vignette 
+- can be used for novel species, but i havent tried it yet... 
+- does ORA, GSEA, and NTA. I wonder if the NTA works at all for novel species???! 
+- super easy to run. many supported namespaces (n = 73), does not require conversions for different functions like clusterProfiler, can even have different napesapce for ORA gene list and background list 
+- "Multiple databases in a vector are supported for ORA and GSEA" 
diff --git a/13-novel-species.Rmd b/13-novel-species.Rmd
@@ -74,6 +74,8 @@ There are then provided to the universal enrichment functions `GSEA` and `enrich
 
 In RStudio, we will extract these file formats from the eggNOG annotations file for axolotl and proceed with FEA. 
 
+Acknowledgement to [Armin Dadras](https://github.com/dadrasarmin) for sharing his [code](https://github.com/dadrasarmin/enrichment_analysis_for_non_model_organism]) to extract `TERM2GENE` and `TERM2NAME` from `emapper` output. 
+
 ### WebGestaltR
 
 This tool can perform ORA or GSEA for any organism with the provision of custom `GMT` and `description` file.