Skip to content

Commit 0b90d6b

Browse files
committed
small tweaks
1 parent 2001c64 commit 0b90d6b

File tree

3 files changed

+64
-44
lines changed

3 files changed

+64
-44
lines changed

10-r-environment-setup.Rmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -225,7 +225,7 @@ The code chunk labelled `working directory` contains only "hashed out" code and
225225

226226
Note that the default directory for RStusio (set in the 'Global options') and the default directory for a code notebook differ! You can change the working directory of the console to match the notebook by issuing the below command into your console directly:
227227

228-
```{r plot2, echo=TRUE, eval=FALSE}
228+
```{r setwd, echo=TRUE, eval=FALSE}
229229
setwd("workshop")
230230
```
231231

11-gprofiler2.Rmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ Since we are doing ORA, we will need a filtered gene list, and a background gene
2525
3. Extract background genes and create a background gene list R object
2626
4. Run ORA with `gost` function
2727
5. Save the tabular results to a file
28-
6. Visualise the results
28+
6. Visualise the results with `gostplot`
2929
7. Run a `gost` multi-query for up-regulated and down-regulated genes
3030
8. Compare `gprofiler2` R results to the `g:Profiler` web results
3131

day2_Rnotebooks/gprofiler2.Rmd

Lines changed: 62 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,10 @@ The input data file is within the current working directory so we do not need to
4646

4747

4848
```{r load input data}
49+
# save data file to an R object called 'data'
4950
data <- read_tsv("Pezzini_DE.txt", col_names = TRUE, show_col_types = FALSE)
51+
52+
# view the first few lines
5053
head(data)
5154
```
5255

@@ -58,9 +61,12 @@ Look on the environment pane of RStudio, and you can see a description '14420 ob
5861

5962
Now we need to filter for differentially expressed genes (DEGs), and we will apply the thresholds adjusted P values/FDR < 0.01, and log2fold change of 2.
6063

64+
We will use ENSEMBL gene IDs (column 1).
65+
6166
```{r get ORA gene list}
67+
# Filter DEGs and save to an object named 'degs'
6268
degs <- data %>%
63-
filter(FDR < 0.01 & abs(Log2FC) > 2) %>%
69+
filter(FDR <= 0.01 & abs(Log2FC) >= 2) %>%
6470
pull(Gene.ID)
6571
cat("Number of genes passing FDR and fold change filter:", length(degs), "\n")
6672
@@ -71,6 +77,7 @@ write.table(degs, "Pezzini_DEGs.txt", row.names = FALSE, col.names = FALSE, quot
7177
cat("Table saved to", title, "\n")
7278
7379
```
80+
7481
We have 792 genes passing our filters.
7582

7683
# 3. Get background gene list
@@ -80,7 +87,9 @@ Recall from the webinar and day 1 of the workshop that an experimental backgroun
8087
The analysis in Degust has already removed lowly expressed genes, so we can simply extract all genes from this data matrix as our background gene list and save it as our 'background' object, as well as save to disk so that we can include it within the supplementary materials of any resultant publications for reproducibility.
8188

8289
```{r get background }
90+
# select the column labelled 'Gene.ID' from the 'data' dataframe, save to object named 'background'
8391
background <- data$Gene.ID
92+
8493
cat("Number of background genes:", length(background), "\n")
8594
8695
# Save the background gene list to disk:
@@ -103,47 +112,42 @@ Before running the below code chunk, review the parameters for the `gost` ORA fu
103112
Observe the similarities to the parameters available on the g:Profiler web interface, for example organism, the correction method (g:Profiler's custom `g_scs` method), and domain scope (background genes).
104113

105114

106-
Run the below code which explicitly includes all available `gost` parameters. An error-free `gost` run should produce no console output. Our results are saved in the R object `ora`.
115+
Run the below code which explicitly includes all available `gost` parameters. Including all parameters, even if the defaults suit your needs, makes your
116+
parameter choices explicit. Sometimes, default settings can change between versions!
117+
118+
An error-free `gost` run should produce no console output. As the code is running, there wll be a green bar to the elft of the code chunk.
119+
120+
Our results are saved in the R object `ora`.
107121

108122

109123
```{r run gost}
110124
ora <- gost(
111-
degs,
112-
organism = "hsapiens",
125+
degs, # 'degs' gene list object
126+
organism = "hsapiens", # human data
113127
ordered_query = FALSE,
114128
multi_query = FALSE,
115-
significant = TRUE,
116-
exclude_iea = FALSE,
129+
significant = TRUE, # only print significant terms
130+
exclude_iea = FALSE, # exclude GO electronic annotations
117131
measure_underrepresentation = FALSE,
118-
evcodes = FALSE,
119-
user_threshold = 0.05,
120-
correction_method = "g_SCS",
121-
domain_scope = "custom_annotated",
122-
custom_bg = background,
123-
numeric_ns = "",
124-
sources = NULL,
125-
as_short_link = FALSE,
126-
highlight = FALSE
132+
evcodes = FALSE, # don't include evidence codes in the results - good to have, but will make it run slower
133+
user_threshold = 0.05, # adj P value cutoff for terms
134+
correction_method = "g_SCS", # gprofiler's custom multiple testing correctionmethod (recommended)
135+
domain_scope = "custom_annotated", # custom background, restrict to only annotated genes
136+
custom_bg = background, # 'background' gene list object
137+
numeric_ns = "", # we don't have numeric IDs
138+
sources = NULL, # use all databases
139+
as_short_link = FALSE, # save our results here not as a weblink to gprofiler
140+
highlight = TRUE # highlight driver terms (will add a 'highlighted' column with TRUE/FALSE)
127141
)
128142
```
129143

130144

131-
Since we are using many of the default parameters, we could shorten this code to what is shown below. The results would be identical, however not as transparent as far as easily identifying what parameters were applied to a run.
132-
133-
```{r abbreviated gost code }
134145

135-
#ora <- gost(
136-
# degs,
137-
# correction_method = "g_SCS",
138-
# domain_scope = "custom_annotated",
139-
# custom_bg = background,
140-
#)
141-
142-
```
143146

144147

145-
View the top-most significant enrichments with the R `head` command. Only significant enrichments passing your specified threshold (adjusted P value < 0.05) are included in the results object.
148+
View the top-most significant enrichments with the R `head` command. Only significant enrichments passing your specified threshold (adjusted P value < 0.05) are included in the results object because we have included `significant = TRUE`.
146149

150+
Use the black arrow on the right of the table to scroll to other columns.
147151

148152
```{r head ora}
149153
head(ora$result)
@@ -152,11 +156,11 @@ head(ora$result)
152156
Let's give our query a name:
153157

154158
```{r ora name the query list }
159+
# reassign query name to something more specific
155160
ora$result$query <- 'DEGs_Padj0.05_FC2'
156161
head(ora$result)
157162
```
158163

159-
Use the small black arrow near the column names to view columns wider than the page width. The `head` view only shows the top 6 enrichments, which are all GO Biological Process.
160164

161165
We can obtain a list of queried databases:
162166

@@ -199,15 +203,15 @@ cat("Table saved to", title, "\n")
199203

200204
Let's extract the results for Reactome and save to a `gosttable`.
201205

202-
```{r reactome gosttable, fig.width = 10 }
206+
```{r reactome gosttable }
203207
# Filter results for the 'Reactome' database
204208
reactome_results <- ora$result %>% filter(source == "REAC")
205209
206210
# Extract all term_ids for Reactome
207211
reac <- reactome_results$term_id
208212
209213
# Create the GOST table for Reactome terms
210-
filename <- "gprofiler_Reactome_gosttable.png"
214+
filename <- "gprofiler_Reactome_gosttable.pdf"
211215
212216
publish_gosttable(ora,
213217
highlight_terms = reac,
@@ -251,11 +255,12 @@ gostplot(ora,
251255
```
252256

253257

254-
There are a lot of significant enrichments for GO biological processes. Many of these are probably terms containing a large number of genes, so not particularly informative. Other R tools have default settings limiting the minimum and maximum number of genes in a geneset to be included in the analysis. Since there is no direct parameter to restrict term size to `gostplot`, we can filter the ORA results before plotting. Let's apply a maximum gene set size of 500, and a minimum gene set size of 10, which are the default setting used by clusterProfiler.
258+
There are a lot of significant enrichments for GO biological processes. Many of these are probably terms containing a large number of genes, so not particularly informative. Other R tools have default settings limiting the minimum and maximum number of genes in a geneset to be included in the analysis. Since there is no direct parameter to restrict term size to `gostplot`, we can filter the ORA results before plotting. Let's apply a maximum gene set size of 500, and a minimum gene set size of 10, which are the default setting used by clusterProfiler.
255259

256260

257261
```{r filter for term size}
258262
# Filter the results for GO:BP terms with term_size <= 500 and >= 10
263+
# save the filtered results in a new object called 'ora_filter_termsize'
259264
ora_filter_termsize <- ora
260265
ora_filter_termsize$result <- ora$result %>% filter(term_size <= 500) %>% filter(term_size >= 10)
261266
```
@@ -284,12 +289,12 @@ gostplot(ora_filter_termsize,
284289

285290
This has cleaned up 'Biological Process' a little bit, enabling signals of more specific terms to be highlighted.
286291

287-
gprofiler2 includes a function for creating a publication-ready image that can optionally highlight specific terms. We need to first produce a plot with `interactice = FALSE`, save it to an object, and then provide that plot object to the `publish_gostplot` function.
292+
`gprofiler2` includes a function for creating a publication-ready image that can optionally highlight specific terms. We need to first produce a plot with `interactice = FALSE`, save it to an object, and then provide that plot object to the `publish_gostplot` function.
288293

289294

290295
```{r save gostplot non-interactive to object}
291296
292-
# Plot with gostplot using the filtered results
297+
# Plot with gostplot using the filtered results, save to object called 'plot'
293298
plot <- gostplot(ora_filter_termsize,
294299
capped = TRUE,
295300
interactive = FALSE,
@@ -314,12 +319,12 @@ The `publish_gostplot` parameter `highlight_terms` enables you to highlight spec
314319
Let's highlight some selected terms manually. You need to provide the term ID not term name.
315320

316321
```{r save terms to highlight }
317-
#Term IDs for 'Collagen degradation' and 'Collagen formation'
322+
#specify term IDs for tmers of interest: 'Collagen degradation' and 'Collagen formation'
318323
highlight <- c("REAC:R-HSA-1442490", "REAC:R-HSA-1474290")
319324
```
320325

321326
```{r save gostplot with highlighted terms}
322-
filename <- "gostplot_filter_termsize_collagen.png"
327+
filename <- "gprofiler_collagen_gostplot.pdf"
323328
324329
publish_gostplot(plot,
325330
highlight_terms = highlight,
@@ -334,16 +339,15 @@ Like g:Profiler web, the coloured boxes on the table are by adjusted P value, wi
334339
You can use R `grepl` function to search for terms with names matching some keyword. Let's highlight all terms related to receptors. The code chunk applies an increased figure height, to ensure we can see the whole plot within the notebook.
335340

336341
```{r highlight receptor terms}
337-
338-
# Filter terms containing "receptor" keyword and create a list of those term IDs
342+
# extract from ora results all terms containing "receptor" keyword and create a list of those term IDs
339343
highlight <- ora$result %>%
340344
filter(grepl("receptor", term_name, ignore.case = TRUE)) %>%
341345
pull(term_id)
342346
```
343347

344348
```{r publish gostplot highlight receptors}
345349
346-
filename <- "gostplot_filter_termsize_receptors.png"
350+
filename <- "gprofiler_receptors_gostplot.pdf"
347351
348352
publish_gostplot(plot,
349353
highlight_terms = highlight,
@@ -437,6 +441,8 @@ By providing more than one gene list and setting `multi_query = TRUE`, results f
437441
First, we need to extracts separate gene lists for up-regulated and down-regulated genes.
438442

439443
```{r get ORA gene lists up and down }
444+
445+
# make an object for upregualted genes
440446
up_degs <- data %>%
441447
filter(FDR < 0.01 & Log2FC >= 2) %>%
442448
pull(Gene.ID)
@@ -447,6 +453,7 @@ title <- "up_DEGs.txt"
447453
write.table(up_degs, title, row.names = FALSE, col.names = FALSE, quote = FALSE)
448454
cat("Up-regulated DEGs saved to", title, "\n")
449455
456+
# make an object for downregualted genes
450457
down_degs <- data %>%
451458
filter(FDR < 0.01 & Log2FC <= -2) %>%
452459
pull(Gene.ID)
@@ -499,7 +506,7 @@ Unfortunately, notebook view squashes the top plot over the bottom one, and adju
499506

500507
```{r gostplot multi non-interactive}
501508
p <- gostplot(ora_multi, capped = TRUE, interactive = FALSE)
502-
filename <- "gprofiler_ORA_multiquery.png"
509+
filename <- "gprofiler_ORA_multiquery.pdf"
503510
504511
publish_gostplot(p,
505512
highlight_terms = NULL,
@@ -525,13 +532,18 @@ sapply(ora_multi$result, class)
525532
Convert the columns that are lists to characters
526533

527534
```{r convert lists}
535+
# convert lists into characters
528536
list_columns <- names(ora_multi$result)[sapply(ora_multi$result, class) == "list"]
529537
ora_multi$result[list_columns] <- lapply(ora_multi$result[list_columns], function(col) {
530538
sapply(col, function(x) paste(x, collapse = ","))
531539
})
532540
sapply(ora_multi$result, class)
533541
```
542+
View the new table format:
534543

544+
```{r check converted table}
545+
head(ora_multi$result)
546+
```
535547

536548

537549
```{r write multiquery table}
@@ -555,10 +567,12 @@ The input file here is one that we have created, but should match yours as long
555567

556568
```{r import g:profiler web results}
557569
web <- read.csv("gProfiler_hsapiens_07-11-2024_11-27-09__intersections.csv")
570+
head(web)
558571
```
559572

560573

561-
Check the numbers: are there any terms significant frm one tool but not the other?
574+
575+
Check the numbers: are there any terms significant from one tool but not the other?
562576

563577
```{r}
564578
# Extract significant term names
@@ -584,7 +598,9 @@ if (length(common_terms) == length(web_terms) && length(common_terms) == length(
584598
585599
```
586600

587-
That's a good start! Do the P values differ? Let's look closely at GO 'Molecular Function'.
601+
That's a good start! Do the P values differ? Let's look closely at the GO 'Molecular Function' P values via a barplot.
602+
603+
Format the P values for plotting:
588604

589605
```{r extract GO MF P values from web and R for comparison }
590606
@@ -606,6 +622,9 @@ comparison_data_long <- comparison_data_go_mf %>%
606622
values_to = "p_value")
607623
```
608624

625+
626+
Create barplot to compare P values web vs R:
627+
609628
```{r print barplot, fig.width=10, fig.height=8}
610629
# Create the bar plot with -log10 transformed p-values
611630
print(ggplot(comparison_data_long, aes(x = term_name, y = -log10(p_value), fill = source)) +
@@ -650,7 +669,6 @@ sessionInfo()
650669
Typically, we would simply run `RStudio.Version()` to print the version details. However, when we knit this document to HTML, the `RStudio.Version()` function is not available and will cause an error. So to make sure our version details are saved to our static record of the work, we will save to a file, then print the file contents back into the notebook.
651670

652671

653-
654672
```{r rstudio version - not run during knit, eval=FALSE}
655673
# Get RStudio version information
656674
rstudio_info <- RStudio.Version()
@@ -683,6 +701,8 @@ rstudio_version_text
683701

684702
# 10. Knit workbook to HTML
685703

704+
Make sure your document is saved if you have made any changes! (there will be an asterisk next to the filename on editor pane if unsaved changes are present).
705+
686706
The last task is to knit the notebook. Our notebook is editable, and can be changed. Deleting code deletes the output, so we could lose valuable details. If we knit the notebook to HTML, we have a permanent static copy of the work.
687707

688708
On the editor pane toolbar, under Preview, select Knit to HTML.

0 commit comments

Comments
 (0)