Skip to content

Commit 5cca582

Browse files
committed
format adjustments
1 parent 8522d76 commit 5cca582

File tree

7 files changed

+91
-112
lines changed

7 files changed

+91
-112
lines changed

09-day2-introduction.Rmd

Lines changed: 14 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -1,45 +1,23 @@
11
# Introduction
22

3-
## Day 1 recap
3+
For today's session, we will be using the [R](https://www.r-project.org/) statistical programming language to perform functional enrichmemt analysis using a range of dedicated R packages. To simplify our work, we will be using the [RStudio](https://posit.co/download/rstudio-desktop/) integrated development environment.
4+
5+
RStudio provides a central place to write code, comments, run code, and view output. We will be using the workshop VMs today which run RStudio and have all the required R packages pre-installed.
6+
7+
<img src="images/rstudio_logo.png" style="border: none; box-shadow: none; background: none;">
48

5-
- We explored several web-based tools and databases for performing Functional Enrichment Analysis (FEA)
6-
- Two types of FEA, Over-Representation Analysis (ORA) and Gene Set Enrichment Analysis (GSEA), were introduced and their differences discussed
7-
- We examined how filtering the input gene list and selecting the background gene list (the "statistical domain scope") influence ORA results
8-
- We explored how the choice of ranking metric affects GSEA outcomes
9-
- The impact of database selection on FEA results was evaluated
10-
- Differences in outputs between various tools were compared and discussed
11-
- Uncertainties, implications, and recommendations were highlighted to address challenges in gene set analysis
129

1310
<p>&nbsp;</p> <!-- insert blank line -->
1411

1512
## Day 2 overview
1613

17-
In day 2 of this workshop, we will focus on FEA with R-based tools.
18-
1914
1. R environment setup including VM login
2015
2. ORA with `gprofiler2` including multi-query of up and down regulated genes
21-
3. GSEA over KEGG database with `clusterProfiler` and multiple enrichmnet visualisations with `enrichplot`
16+
3. GSEA over KEGG database with `clusterProfiler` and multiple enrichment visualisations with `enrichplot`
2217
4. ORA and GSEA with `WebGestaltR`, interactive HTML reports, and exploration of term redundancy options
23-
5. Novel species FEA with `cluterProfiler`, `WebGestaltR` and `STRING` (web)
18+
5. Novel species FEA with `clusterProfiler`, `WebGestaltR` and `STRING` (web)
2419
6. End of workshop summary
2520

26-
<p>&nbsp;</p> <!-- insert blank line -->
27-
28-
## R FEA tool choice
29-
30-
At a high level, there are 3 fundamental questions to consider when selecting an anaylsis tool:
31-
32-
**1.** Does the governance of your data prohibit you from analysing it on an external server? (ie "highly protected" data)
33-
**2.** What web or R tools support your species and relevant annotation database?
34-
**3.** Are you comfortable using R?
35-
36-
<p>&nbsp;</p> <!-- insert blank line -->
37-
38-
<img src="images/r-v-web-decision-tree.png" style="border: none; box-shadow: none; background: none; width: 100%;">
39-
40-
<p>&nbsp;</p> <!-- insert blank line -->
41-
42-
4321

4422
<p>&nbsp;</p> <!-- insert blank line -->
4523

@@ -53,14 +31,16 @@ There are numerous R packages for FEA, each with their strenghts and limitations
5331

5432
<p>&nbsp;</p> <!-- insert blank line -->
5533

56-
The following 3 R tools have been selected for today's workshop:
34+
The following three R tools have been selected for today's workshop:
5735

58-
**1.** `gprofiler1` due to its ease of use, high number of supported species, and multiple database enrichments produced within a single run. Caveat: only ORA analysis
59-
**2.** `clusterProfiler2` due to its integrated database support, runs both GSEA and ORA, companion plotting tool `enrichplot` with diverse plot options, and novel species support
60-
**3.** `WebGestaltR` due to its ease of use, high number of suported databases and namespaces, runs both GSEA and ORA, interactive HTML reports and plots, and novel species support
36+
1. `gprofiler2` due to its ease of use, high number of supported species, and multiple database enrichments produced within a single run. Caveat: only ORA analysis
37+
2. `clusterProfiler2` due to its integrated database support, runs both GSEA and ORA, companion plotting tool `enrichplot` with diverse plot options, and novel species support
38+
3. `WebGestaltR` due to its ease of use, high number of suported databases and namespaces, runs both GSEA and ORA, interactive HTML reports and plots, and novel species support
6139

6240
## Day 2 approach
6341

64-
For this part of the workshop, we will be doing a 'semi code-along' approach, where you are not required to type any code (apart from a few basics in the RStudio familiarisation part next up) but we will be running code chunks together, talking about what each part of the code is doing, and viewing the output in real time. All of the code required has been prepared into separate code files for each activity, and we will download these to the VMs in the next section.
42+
For this part of the workshop, we will be doing a **semi code-along** approach, where you are not required to type any code (apart from a few basics in the RStudio familiarisation part next up) but we will be running code chunks together, talking about what each part of the code is doing, and viewing the output in real time.
43+
44+
All of the code required has been prepared into separate code files for each activity, and we will download these to the VMs in the next section.
6545

6646

10-r-environment-setup.Rmd

Lines changed: 15 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,15 @@
11
# R environment set up
22

3-
For today's session, we will be using the [R](https://www.r-project.org/) statistical programming language to perform functional enrichmemt analysis using a range of dedicated R packages. To simplify our work, we will be using the [RStudio](https://posit.co/download/rstudio-desktop/) integrated development environment.
43

5-
This will create a central place to write code, comments, run code, view and save graphics.
64

7-
<img src="images/rstudio_logo.png" style="border: none; box-shadow: none; background: none;">
8-
9-
10-
<p>&nbsp;</p> <!-- insert blank line -->
11-
12-
## RStudio basics
5+
## VM login
136

147
You have previosuly been provided with an IP address, user name and password for a VM for this workshop. The VM runs RStudio and has all of the required R libraries for today's workshop pre-installed.
158

169

1710
<p>&nbsp;</p> <!-- insert blank line -->
1811

19-
&#x27A4; Open the RStudio VM by copying the IP address into a web browser.
12+
&#x27A4; Open the VM by copying the IP address into a web browser.
2013

2114
When prompted, select `RStudio` from the options:
2215

@@ -74,19 +67,6 @@ cat(name)
7467

7568
Note that the object `name` is now listed in the environment.
7669

77-
<p>&nbsp;</p> <!-- insert blank line -->
78-
79-
Now let's see the `plots` pane in action by creating a simple dummy barplot:
80-
81-
&#x27A4; Copy paste the below code into your console then press enter:
82-
83-
```{r plot, echo=TRUE, eval=FALSE}
84-
values <- c(5, 10, 15)
85-
labels <- c("A", "B", "C")
86-
barplot(values, names.arg = labels)
87-
```
88-
89-
You will now see a barplot in the `plots` pane, as well as 2 new objects in the `environment`. Note that the environment list also tells you what type of objects it is: for our `name` object, the double quotes indicate it's a single string, where `chr [1:3]` for `labels` shows that it is a `character vector` with 3 text elements. `num [1:3]` indicates that `values` is a numeric vector with 3 values. The takeaway message here is that the environment shows your active R objects, and that these are of different types. This is important to be aware of when working in R, because different R functions require different input types. During the workshop, we may need to convert our input to a `dataframe` or `vector` for example.
9070

9171
<p>&nbsp;</p> <!-- insert blank line -->
9272

@@ -95,7 +75,7 @@ You will now see a barplot in the `plots` pane, as well as 2 new objects in the
9575

9676
Today we will not be entering R commands directly into the console like this. We will instead be using an R notebook.
9777

98-
Using notebooks in RStudio is a great way to save your code and comments, as well as have the code output display inside the notebook. Notebooks can be easily shared with others so they can run your analysis, and also rendered to HTML which is a neat way of saving and presenting results to others.
78+
Using notebooks in RStudio is a great way to save your code and comments, as well as have the code output display inside the notebook. Notebooks can be easily shared with others so they can run your analysis, and also rendered to HTML which is a neat way of saving a static copy of your work and presenting results to others.
9979

10080
<p>&nbsp;</p> <!-- insert blank line -->
10181

@@ -122,20 +102,9 @@ New code chunks can be added with the shortcut `ctrl + alt + i` or via the toolb
122102

123103
&#x27A4; Run the demo code chunk that was included in the new notebook to plot `cars`
124104

125-
Note that the plot dispalys *inside* the notebook, rather than within the plot pane as we saw earlier.
126-
127-
<p>&nbsp;</p> <!-- insert blank line -->
128-
129-
130-
&#x27A4; Add a new code chunk by entering `ctrl + alt + i` and label it `barplot`. Then copy the dummy barplot code from earlier into the code chunk and run it
105+
Note that the plot dispalys *inside* the notebook, rather than within the plot pane. If this code was executed directly from the `console` rather than the notebook, the plot would be in the `Plots` pane.
131106

132107

133-
```{r barplot, echo=TRUE, eval=FALSE}
134-
values <- c(5, 10, 15)
135-
labels <- c("A", "B", "C")
136-
barplot(values, names.arg = labels)
137-
```
138-
139108
<p>&nbsp;</p> <!-- insert blank line -->
140109

141110
### Rendered HTML notebooks
@@ -160,8 +129,6 @@ Next we will look at what a HTML version of the notebook looks like. In order to
160129

161130
Note that the HTML is saved to your current working directory, which we previously verified was `/home/userN`. Check that the file appears where you expect it to via the `Files` pane.
162131

163-
The workshop data you have downloadaed contains 4 R notebooks required to run the analyses. After each activity, you may choose to knit the notebook to HTML to have a static record of your work 📖
164-
165132
<p>&nbsp;</p> <!-- insert blank line -->
166133

167134
### A fresh workspace
@@ -171,10 +138,20 @@ Next we will open the R notebook for the first analysis activity. It's ideal to
171138

172139
&#x27A4; Clear your environment by selecting `Session` &rarr; `Quit session` &rarr; `Dont save` &rarr;`Start mew session`
173140

174-
**Note:** when asked `Save workspace image to ~/R.Data?` it is typically advisable to select `Don't Save`. Not saving the workspace image can help avoid workspace clashes that can be hard to resolve or have unintended consequences. You don't need to worry about losing data - after all, your input data is saved elsewhere, and your R code that produces all required outputs is safely saved within the notebook.
141+
**Note:** when asked `Save workspace image to ~/R.Data?` please select **`Don't Save`** during this workshop.
142+
143+
<p>&nbsp;</p> <!-- insert blank line -->
144+
<details>
145+
<summary>Workspace data: to save or not to save?</summary>
146+
147+
Not saving the workspace image can help avoid workspace clashes that can be hard to resolve or have unintended consequences. You don't need to worry about losing data - after all, your input data is saved elsewhere, and your R code that produces all required outputs is safely saved within the notebook.
175148

176149
Saving the workspace image saves all objects from the session such as your variables and dataframes. This can save time if you need to close an analysis part way through and continue later. However, this can have drawbacks such as library and function name clashes, unexpected objects present in the environment, large objects and relic objects cluttering the workspace, old objects conflicting with new ones, etc. Since we will be performing discrete analysis tasks today, and not continuing on a growing body of work, selecting `Dont save` will be the most appropriate.
177150

151+
</details>
152+
153+
<p>&nbsp;</p> <!-- insert blank line -->
154+
178155
👀 You may notice there is also a `Clear Workspace` option under `Session`. This will remove all R objects from your environment, but it won't remove loaded libraries. You can of course unload these with R code, but refreshing the session is easier 😊 Some libraries share function names. If you are being very correct, you can prefix the R package before the function, to ensure the exact function you want is being called to avoid any potential function clashes. I admit to being guilty of not doing that enough! 🤭
179156

180157
<p>&nbsp;</p> <!-- insert blank line -->

12-clusterprofiler.Rmd

Lines changed: 3 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55

66
One of the key advantages of using R over web tools is flexibility with visualisations.
77

8-
The same authors have released a companion plotting package [enrichplot](https://www.bioconductor.org/packages/release/bioc/html/enrichplot.html) dedicated to plotting enrichment results. In this activity, we will perform GSEA with `clusterProfiler` then explore many different visualisation options. At the end of the activity, we will have a poll to see which of the many plot types are the favourites! 😁
8+
The same authors have released a plotting package [enrichplot](https://www.bioconductor.org/packages/release/bioc/html/enrichplot.html) dedicated to plotting enrichment results. In this activity, we will perform GSEA with `clusterProfiler` then explore many different visualisation options. At the end of the activity, we will have a poll to see which of the many plot types are the favourites! 😁
99

1010

1111

@@ -60,10 +60,7 @@ Let's head over to RStudio now and try out some functions! 🏃
6060

6161

6262
&#x27A4; Open the `clusterProfiler.Rmd` notebook
63-
64-
&#x27A4; Load the notebook `gprofiler2.Rmd` notebook by clicking on it in the `Files` pane
65-
66-
You could also open the file by selecting `File` &rarr; `Open file`, or use the keyboard shortcut `ctrl + o`.
63+
6764

6865
**<span style="color: #006400;">Instructions for the analysis will continue from the R notebook.</span>**
6966

@@ -87,7 +84,7 @@ This may be the one you found most informative, easiest to interpret, most eye-c
8784

8885
<p>&nbsp;</p> <!-- insert blank line -->
8986

90-
<img src="images/poll-favourite-plot-type.jpg " style="border: none; box-shadow: none; background: none; width: 100%;">
87+
<img src="images/poll-favourite-plot-type.png" style="border: none; box-shadow: none; background: none; width: 100%;">
9188

9289

9390

13-webgestaltr.Rmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,4 +51,4 @@ You could also open the file by selecting `File` &rarr; `Open file`, or use the
5151
- We have reviewed the organisms and databases that are natively supported by this easy to use tool
5252
- We have run both ORA and GSEA and explored the interactive HTML results summary
5353
- We have touched on the redundancy filters available within this tool, for GO as well as two external algorithms applied automatically to any enrichment performed
54-
- In the next session, we will use `WebGestaltR` with novel species
54+
- In the next session, we will use `WebGestaltR` for novel species FEA

0 commit comments

Comments
 (0)