You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: 10-r-environment-setup.Rmd
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -225,7 +225,7 @@ The code chunk labelled `working directory` contains only "hashed out" code and
225
225
226
226
Note that the default directory for RStusio (set in the 'Global options') and the default directory for a code notebook differ! You can change the working directory of the console to match the notebook by issuing the below command into your console directly:
Copy file name to clipboardExpand all lines: day2_Rnotebooks/gprofiler2.Rmd
+62-42Lines changed: 62 additions & 42 deletions
Original file line number
Diff line number
Diff line change
@@ -46,7 +46,10 @@ The input data file is within the current working directory so we do not need to
46
46
47
47
48
48
```{r load input data}
49
+
# save data file to an R object called 'data'
49
50
data <- read_tsv("Pezzini_DE.txt", col_names = TRUE, show_col_types = FALSE)
51
+
52
+
# view the first few lines
50
53
head(data)
51
54
```
52
55
@@ -58,9 +61,12 @@ Look on the environment pane of RStudio, and you can see a description '14420 ob
58
61
59
62
Now we need to filter for differentially expressed genes (DEGs), and we will apply the thresholds adjusted P values/FDR < 0.01, and log2fold change of 2.
60
63
64
+
We will use ENSEMBL gene IDs (column 1).
65
+
61
66
```{r get ORA gene list}
67
+
# Filter DEGs and save to an object named 'degs'
62
68
degs <- data %>%
63
-
filter(FDR < 0.01 & abs(Log2FC) > 2) %>%
69
+
filter(FDR <= 0.01 & abs(Log2FC) >= 2) %>%
64
70
pull(Gene.ID)
65
71
cat("Number of genes passing FDR and fold change filter:", length(degs), "\n")
@@ -80,7 +87,9 @@ Recall from the webinar and day 1 of the workshop that an experimental backgroun
80
87
The analysis in Degust has already removed lowly expressed genes, so we can simply extract all genes from this data matrix as our background gene list and save it as our 'background' object, as well as save to disk so that we can include it within the supplementary materials of any resultant publications for reproducibility.
81
88
82
89
```{r get background }
90
+
# select the column labelled 'Gene.ID' from the 'data' dataframe, save to object named 'background'
83
91
background <- data$Gene.ID
92
+
84
93
cat("Number of background genes:", length(background), "\n")
85
94
86
95
# Save the background gene list to disk:
@@ -103,47 +112,42 @@ Before running the below code chunk, review the parameters for the `gost` ORA fu
103
112
Observe the similarities to the parameters available on the g:Profiler web interface, for example organism, the correction method (g:Profiler's custom `g_scs` method), and domain scope (background genes).
104
113
105
114
106
-
Run the below code which explicitly includes all available `gost` parameters. An error-free `gost` run should produce no console output. Our results are saved in the R object `ora`.
115
+
Run the below code which explicitly includes all available `gost` parameters. Including all parameters, even if the defaults suit your needs, makes your
116
+
parameter choices explicit. Sometimes, default settings can change between versions!
117
+
118
+
An error-free `gost` run should produce no console output. As the code is running, there wll be a green bar to the elft of the code chunk.
119
+
120
+
Our results are saved in the R object `ora`.
107
121
108
122
109
123
```{r run gost}
110
124
ora <- gost(
111
-
degs,
112
-
organism = "hsapiens",
125
+
degs, # 'degs' gene list object
126
+
organism = "hsapiens", # human data
113
127
ordered_query = FALSE,
114
128
multi_query = FALSE,
115
-
significant = TRUE,
116
-
exclude_iea = FALSE,
129
+
significant = TRUE, # only print significant terms
130
+
exclude_iea = FALSE, # exclude GO electronic annotations
117
131
measure_underrepresentation = FALSE,
118
-
evcodes = FALSE,
119
-
user_threshold = 0.05,
120
-
correction_method = "g_SCS",
121
-
domain_scope = "custom_annotated",
122
-
custom_bg = background,
123
-
numeric_ns = "",
124
-
sources = NULL,
125
-
as_short_link = FALSE,
126
-
highlight = FALSE
132
+
evcodes = FALSE, # don't include evidence codes in the results - good to have, but will make it run slower
133
+
user_threshold = 0.05, # adj P value cutoff for terms
domain_scope = "custom_annotated", # custom background, restrict to only annotated genes
136
+
custom_bg = background, # 'background' gene list object
137
+
numeric_ns = "", # we don't have numeric IDs
138
+
sources = NULL, # use all databases
139
+
as_short_link = FALSE, # save our results here not as a weblink to gprofiler
140
+
highlight = TRUE # highlight driver terms (will add a 'highlighted' column with TRUE/FALSE)
127
141
)
128
142
```
129
143
130
144
131
-
Since we are using many of the default parameters, we could shorten this code to what is shown below. The results would be identical, however not as transparent as far as easily identifying what parameters were applied to a run.
132
-
133
-
```{r abbreviated gost code }
134
145
135
-
#ora <- gost(
136
-
# degs,
137
-
# correction_method = "g_SCS",
138
-
# domain_scope = "custom_annotated",
139
-
# custom_bg = background,
140
-
#)
141
-
142
-
```
143
146
144
147
145
-
View the top-most significant enrichments with the R `head` command. Only significant enrichments passing your specified threshold (adjusted P value < 0.05) are included in the results object.
148
+
View the top-most significant enrichments with the R `head` command. Only significant enrichments passing your specified threshold (adjusted P value < 0.05) are included in the results object because we have included `significant = TRUE`.
146
149
150
+
Use the black arrow on the right of the table to scroll to other columns.
147
151
148
152
```{r head ora}
149
153
head(ora$result)
@@ -152,11 +156,11 @@ head(ora$result)
152
156
Let's give our query a name:
153
157
154
158
```{r ora name the query list }
159
+
# reassign query name to something more specific
155
160
ora$result$query <- 'DEGs_Padj0.05_FC2'
156
161
head(ora$result)
157
162
```
158
163
159
-
Use the small black arrow near the column names to view columns wider than the page width. The `head` view only shows the top 6 enrichments, which are all GO Biological Process.
There are a lot of significant enrichments for GO biological processes. Many of these are probably terms containing a large number of genes, so not particularly informative. Other R tools have default settings limiting the minimum and maximum number of genes in a geneset to be included in the analysis. Since there is no direct parameter to restrict term size to `gostplot`, we can filter the ORA results before plotting. Let's apply a maximum gene set size of 500, and a minimum gene set size of 10, which are the default setting used by clusterProfiler.
258
+
There are a lot of significant enrichments for GO biological processes. Many of these are probably terms containing a large number of genes, so not particularly informative. Other R tools have default settings limiting the minimum and maximum number of genes in a geneset to be included in the analysis. Since there is no direct parameter to restrict term size to `gostplot`, we can filter the ORA results before plotting. Let's apply a maximum gene set size of 500, and a minimum gene set size of 10, which are the default setting used by clusterProfiler.
255
259
256
260
257
261
```{r filter for term size}
258
262
# Filter the results for GO:BP terms with term_size <= 500 and >= 10
263
+
# save the filtered results in a new object called 'ora_filter_termsize'
This has cleaned up 'Biological Process' a little bit, enabling signals of more specific terms to be highlighted.
286
291
287
-
gprofiler2 includes a function for creating a publication-ready image that can optionally highlight specific terms. We need to first produce a plot with `interactice = FALSE`, save it to an object, and then provide that plot object to the `publish_gostplot` function.
292
+
`gprofiler2` includes a function for creating a publication-ready image that can optionally highlight specific terms. We need to first produce a plot with `interactice = FALSE`, save it to an object, and then provide that plot object to the `publish_gostplot` function.
288
293
289
294
290
295
```{r save gostplot non-interactive to object}
291
296
292
-
# Plot with gostplot using the filtered results
297
+
# Plot with gostplot using the filtered results, save to object called 'plot'
293
298
plot <- gostplot(ora_filter_termsize,
294
299
capped = TRUE,
295
300
interactive = FALSE,
@@ -314,12 +319,12 @@ The `publish_gostplot` parameter `highlight_terms` enables you to highlight spec
314
319
Let's highlight some selected terms manually. You need to provide the term ID not term name.
315
320
316
321
```{r save terms to highlight }
317
-
#Term IDs for 'Collagen degradation' and 'Collagen formation'
322
+
#specify term IDs for tmers of interest: 'Collagen degradation' and 'Collagen formation'
@@ -334,16 +339,15 @@ Like g:Profiler web, the coloured boxes on the table are by adjusted P value, wi
334
339
You can use R `grepl` function to search for terms with names matching some keyword. Let's highlight all terms related to receptors. The code chunk applies an increased figure height, to ensure we can see the whole plot within the notebook.
335
340
336
341
```{r highlight receptor terms}
337
-
338
-
# Filter terms containing "receptor" keyword and create a list of those term IDs
342
+
# extract from ora results all terms containing "receptor" keyword and create a list of those term IDs
# Create the bar plot with -log10 transformed p-values
611
630
print(ggplot(comparison_data_long, aes(x = term_name, y = -log10(p_value), fill = source)) +
@@ -650,7 +669,6 @@ sessionInfo()
650
669
Typically, we would simply run `RStudio.Version()` to print the version details. However, when we knit this document to HTML, the `RStudio.Version()` function is not available and will cause an error. So to make sure our version details are saved to our static record of the work, we will save to a file, then print the file contents back into the notebook.
651
670
652
671
653
-
654
672
```{r rstudio version - not run during knit, eval=FALSE}
655
673
# Get RStudio version information
656
674
rstudio_info <- RStudio.Version()
@@ -683,6 +701,8 @@ rstudio_version_text
683
701
684
702
# 10. Knit workbook to HTML
685
703
704
+
Make sure your document is saved if you have made any changes! (there will be an asterisk next to the filename on editor pane if unsaved changes are present).
705
+
686
706
The last task is to knit the notebook. Our notebook is editable, and can be changed. Deleting code deletes the output, so we could lose valuable details. If we knit the notebook to HTML, we have a permanent static copy of the work.
687
707
688
708
On the editor pane toolbar, under Preview, select Knit to HTML.
0 commit comments