Skip to content

Commit 5e82e4a

Browse files
author
richardjacton
committed
merge devel
1 parent 826d3e9 commit 5e82e4a

22 files changed

+594
-5
lines changed

DESCRIPTION

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
Package: MutantSets
22
Title: What the Package Does (One Line, Title Case)
3-
Version: 0.0.0.9000
3+
Version: 0.0.0.9001
44
Authors@R:
55
person(given = "Richard J.",
66
family = "Acton",
@@ -12,7 +12,7 @@ License: CC BY 4.0
1212
Encoding: UTF-8
1313
LazyData: true
1414
Roxygen: list(markdown = TRUE)
15-
RoxygenNote: 7.1.1
15+
RoxygenNote: 7.1.2
1616
Depends:
1717
shiny,
1818
shinydashboard,
@@ -34,4 +34,7 @@ Imports:
3434
htmlwidgets
3535
Suggests:
3636
testthat,
37-
covr
37+
covr,
38+
knitr,
39+
rmarkdown
40+
VignetteBuilder: knitr

R/functions.R

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -347,6 +347,23 @@ loci_plot <- function(df) { ## var_type_colours !! global
347347
)
348348
}
349349

350+
## | SNP_freq_plot ------------------------------------------------------------
351+
352+
SNP_freq_plot <- function(df) {
353+
ggplot2::ggplot(df, ggplot2::aes(POS)) +
354+
ggplot2::geom_density() +
355+
ggplot2::facet_wrap(~CHROM, nrow = 1, scales = "free_x") +
356+
ggplot2::scale_x_continuous(labels = scales::comma) +
357+
ggplot2::theme_light() +
358+
ggplot2::theme(
359+
axis.text.x = ggplot2::element_text(angle = 30, hjust = 1)
360+
) +
361+
ggplot2::labs(
362+
x = "Position (bp)",
363+
y = "Variant Density"#,colour = "", alpha = ""
364+
)
365+
}
366+
350367
#' layout_ggplotly
351368
#'
352369
#' Tweaks the layout of the x and y axis labels so they don't overlap

R/server.R

Lines changed: 25 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -112,7 +112,13 @@ server <- function(input, output) {
112112
# }
113113

114114
#dplyr::pull(pos)
115-
})
115+
}) %>%
116+
bindCache(
117+
input$DP_filter, input$QUAL_filter, input$QR_filter,
118+
input$QA_filter, input$AF_filter,
119+
input$picked_chr
120+
) %>%
121+
bindEvent(input$go)
116122

117123
# Genotype filtering ------------------------------------------------------
118124
## | Set sample names -----------------------------------------------------
@@ -186,6 +192,24 @@ server <- function(input, output) {
186192
#girafe(code = print(plot))
187193
})
188194

195+
## | Variant Denisty Plot -------------------------------------------------
196+
output$vdplot <- plotly::renderPlotly({
197+
loci() %>%
198+
SNP_freq_plot() %>%
199+
plotly::ggplotly(dynamicTicks = TRUE, .) %>%
200+
layout_ggplotly() %>%
201+
plotly::layout(
202+
legend = list(
203+
title = list(text = ""),
204+
valign = "bottom"
205+
)
206+
) %>%
207+
plotly::config(
208+
displaylogo = FALSE,
209+
modeBarButtonsToRemove = list("hoverCompareCartesian")
210+
)
211+
212+
})
189213

190214
# output$chrplot_sel <- renderPrint({
191215
# event_data("plotly_selected")

R/ui.R

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ sidebar <- shinydashboard::dashboardSidebar(
66
shinydashboard::sidebarMenu(
77
fileInput("vcf", "Select a VCF file", accept = ".vcf"),
88
fileInput("gff", "Select a gff file", accept = ".gff"),
9+
actionButton("go","Start / Apply Filters"),
910
#menuItem("Options", tabName = "options", icon = icon("th")),
1011
shinydashboard::menuItem(
1112
"Filtering", tabName = "table", icon = icon("table")
@@ -81,6 +82,10 @@ body <- shinydashboard::dashboardBody(
8182
#girafeOutput("chrplot")
8283
#verbatimTextOutput("testpoints")
8384
),
85+
tabPanel(
86+
"Variant Density",
87+
plotly::plotlyOutput("vdplot")
88+
),
8489
tabPanel(
8590
"Effect",
8691
DT::DTOutput("effect")

README.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
[![Travis build status](https://travis-ci.com/CECADBioinformaticsCoreFacility/MutantSets.svg?branch=master)](https://travis-ci.com/CECADBioinformaticsCoreFacility/MutantSets)
77
<!-- badges: end -->
88

9-
The goal of MutantSets is to ...
9+
The goal of MutantSets is to permit the exploration of the results of whole genome sequencing and mutation calling in *C. elegans*, with the goal of identify candidate mutations responsible for phenotypes in genetic screens though mapping by sequencing.
1010

1111
## Installation
1212

@@ -30,3 +30,8 @@ To start the app run:
3030
MutantSets::launchApp()
3131
```
3232

33+
For instructions on how to prepare data for use in this app see the vignette:
34+
35+
```r
36+
vignette("Generating-input", package = "MutantSets")
37+
```

vignettes/Generating-input.Rmd

Lines changed: 173 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,173 @@
1+
---
2+
title: "Mutation Mapping Pipiline for *C. elegans* EMS mutagenesis and Backcross Experiments"
3+
author: "Richard J. Acton"
4+
date: "`r Sys.Date()`"
5+
output:
6+
html_document:
7+
fig_caption: yes
8+
number_sections: yes
9+
toc: yes
10+
df_print: paged
11+
vignette: >
12+
%\VignetteIndexEntry{Generating-input}
13+
%\VignetteEngine{knitr::rmarkdown}
14+
%\VignetteEncoding{UTF-8}
15+
bibliography: "assets/bib.bib"
16+
csl: "assets/genomebiology.csl"
17+
link-citations: yes
18+
linkcolor: blue
19+
---
20+
21+
```{r, include = FALSE}
22+
knitr::opts_chunk$set(
23+
collapse = TRUE,
24+
comment = "#>"
25+
)
26+
```
27+
# Quick Start
28+
29+
- __Inputs:__
30+
- paired-end fastq files to a galaxy [@Afgan2016; @Jalili2020] history as `list of dataset pairs`
31+
- A suitable genome fasta file (*C. elegans*, ce11.fa.gz - Compatible with WBcel235.86 used by SnpEff)
32+
- Run the Pipeline: https://usegalaxy.eu/u/richardjacton/w/c-elegans-ems-mutagenesis-mutation-caller
33+
- __Outputs:__
34+
- [`MultiQC`](https://multiqc.info/) HTML report with QC info on the input fastqs, trimming, mapping, and deduplication steps.
35+
- `.vcf` file with variants from all samples (FreeBayes mutation caller)
36+
- `.vcf` file with variants from all samples (MiModD mutation caller)
37+
- `.gff` file with deletions from all samples (MiModD deletion calling tool)
38+
- Perform Quality filtering and appropriate set subtractions with [`MutantSets`](https://github.com/RichardJActon/MutantSets) or alternatively the `MiModD VCF Filter` or `SnpSift Filter` tools to identify candidate variants.
39+
- (optionally) `MiModD NacreousMap` for visualisation of mutation locations and `MiModD Report Variants` for HTML mutation list
40+
41+
NB samples are expected to be of the form 'A123_0001_S1_R1_L001.fq.gz', sample Identifiers are extracted from this with a regular expression: `\w+_(\d+)_S\d+_L\d+.*`. This would yield the sample identifier of: 0001. If your file does not conform to this pattern you may need to update this regex by editing the rules in the 'apply rule to collection' step of the workflow.
42+
43+
# Background
44+
45+
Doitsidou et al. reviewed Sequencing-Based Approaches for Mutation Mapping and Identification in *C. elegans* [@Doitsidou2016]. They describe three main approaches to mapping by sequencing:
46+
47+
1. Hawaiian variant mapping
48+
2. EMS-density mapping
49+
3. Variant discovery mapping
50+
51+
This pipeline is currently only compatible with 2 of them, EMS-density mapping & Variant discovery mapping (VDM).
52+
53+
![](graphics/Doitsidou2016_fig2.png)
54+
55+
![](graphics/Doitsidou2016_fig3.png)
56+
57+
## Research Need
58+
59+
The Schumacher lab identified a need for an analysis pipeline to map and identify mutations in Ethyl methanesulfonate (EMS) mutagenesis forward genetic screens.
60+
61+
Previously a tool called `CloudMap` [@Minevich2012] had been used for this purpose on a Galaxy server.
62+
`CloundMap` is no longer under active development and has been deprecated from [Galaxy Europe](https://usegalaxy.eu/) and replaced by `MiModD` [Docs](https://mimodd.readthedocs.io/en/doc0.1.8/index.html)
63+
64+
## Choice of Tools
65+
66+
In a comparison of *C. elegans* mutation calling pipelines Smith et al. [@Smith2017a] indicated that they had good results with the `FreeBayes` [@Garrison2012].
67+
So I have initially included this tool here in addition to the`MiModD` mutation caller to evaluate their relative performance.
68+
They also found the the `BBMap` aligner yielded better results however this is not available in Galaxy so I have opted for `Bowtie2` for expediency.
69+
70+
# Pipeline Summary
71+
72+
[Pipeline File (Local)](assets/Galaxy-Workflow-EMS_Mutagenesis_Backcross_Mutation_Caller.ga)
73+
74+
https://usegalaxy.eu/u/richardjacton/w/c-elegans-ems-mutagenesis-mutation-caller
75+
76+
- Adapter and Quality Trimming with [`fastp`](https://github.com/OpenGene/fastp) [@Chen2018]
77+
- Alignment with [`bowtie2 --sensitive-local`](https://github.com/BenLangmead/bowtie2) [@Langmead2012]
78+
- [`samtools view`](https://github.com/samtools/samtools) requiring that reads are mapped in a proper pair [@Li2009b]
79+
- Removal of PCR duplicates with [`Picard MarkDuplicates`](https://broadinstitute.github.io/picard/command-line-overview.html#MarkDuplicates) [@BroadInstitute2019]
80+
- Left alignment of indels in the BAM files using [`FreeBayes`](https://github.com/ekg/freebayes) [@Garrison2012]
81+
- [`MultiQC`](https://github.com/ewels/MultiQC) aggregating quality metrics from trimming, deduplication and alignment [@Ewels2016]
82+
- Variant calling with [`FreeBayes`](https://github.com/ekg/freebayes) [@Garrison2012], [`MiModD`](https://mimodd.readthedocs.io/) [@Baumeister2013] variant caller and deletion caller
83+
- SNP effect annotation with [`SnfEff eff`](http://snpeff.sourceforge.net/SnpEff.html) [@Cingolani2012]
84+
- SNP type annotation with [`SnpSift Variant Type`](http://snpeff.sourceforge.net/SnpSift.html) [@Cingolani2012]
85+
86+
# Instructions (Step-by-Step)
87+
88+
__1. Upload Data to galaxy__
89+
90+
![](graphics/upload.png)
91+
92+
__2. Select all fastq files and create a paired list__
93+
94+
![](graphics/select-all.png)
95+
96+
![](graphics/build-list-paired-data.png)
97+
98+
__3. Pair the fastq files__
99+
100+
![](graphics/pairing-dialog.png)
101+
102+
__4. Import the [workflow](https://usegalaxy.eu/u/richardjacton/w/c-elegans-ems-mutagenesis-mutation-caller)__
103+
104+
![](graphics/import_workflow.png)
105+
106+
__5. Run the workflow__
107+
108+
Select the paired list object and a genome sequence file as inputs
109+
110+
![](graphics/run_workflow.png)
111+
112+
__6. Check Quality Control Information__
113+
114+
Inspect the `MultiQC` output for signs of technical problems with your data.
115+
Consult with your friendly local bioinformatician if there are QC issues you can't diagnose.
116+
117+
![](graphics/view_multiqc.png)
118+
119+
__7. Preliminary quality filtering `SnpSift filter`__
120+
121+
Locate the [`SnpSift filter`](https://pcingola.github.io/SnpEff/SnpSift.html#filter) tool in the galaxy tools panel and apply some initial quality filters, simply `( QUAL > 15)` or `20` is probably sufficient.
122+
Starting with a low stringency filter and applying more stringent criteria when inspecting your candidate mutations it is probably advisable to avoid throwing out possible mutations.
123+
Some initial filtering is advisable as the full-sized VCF files may be too large to be easily read by the candidate mutant inspection tool in the next steps.
124+
You can check how many lines are in your VCF files by selecting them in the Galaxy history.
125+
126+
![](graphics/SnpSift_filter.png)
127+
128+
__8. Download Data__
129+
130+
The main `FreeBayes` VCF file:
131+
132+
![](graphics/Download_filtered_vcf.png)
133+
134+
The `MiModD` deletion calls:
135+
136+
![](graphics/Download_MiModD_deletions.png)
137+
138+
__9. Load the results in the `MutantSets` Shiny App to identify candidate mutations__
139+
140+
If running the App locally, install the [`R`](https://www.r-project.org/) package from: https://github.com/RichardJActon/MutantSets
141+
142+
R package installation and running the app locally:
143+
144+
```
145+
# install.packages("remotes") # If you don't already have remotes/devtools
146+
# remotes::install_github("knausb/vcfR") # If vcfR fails to install from CRAN
147+
remotes::install_github("RichardJActon/MutantSets")
148+
MutantSets::launchApp() # opens the app in a web browser
149+
```
150+
151+
- Load the VCF and (optionally) the gff deletion mutant files into `MutantSets`
152+
- (Optionally) Name your samples something easier to understand
153+
- Use the genotype filters to subtract the appropriate sets
154+
- Tweak quality and allele frequency thresholds to get a small set of high quality candidates
155+
- Assess the candidate mutations by clicking on them and looking at their predicted effects and genomic locations
156+
- Download your top results as a `.tsv` file (openable in excel)
157+
158+
__You should now have some candidate mutants to screen - Good Luck!__
159+
160+
# Feedback
161+
162+
Please direct bug reports, feature requests, and questions to the maintainer of the mutant sets package via [github issues](https://github.com/RichardJActon/MutantSets/issues.
163+
164+
# References
165+
166+
```{r, echo=FALSE, include=FALSE, eval=FALSE}
167+
getCitations::getCitations(
168+
normalizePath("C-elegans_Backcross_mutation_calling_Galaxy_Workflow.Rmd"),
169+
normalizePath("assets/bib.bib"),
170+
"~/Documents/bibtex/library.bib"
171+
)
172+
```
173+

vignettes/assets/Galaxy-Workflow-EMS_Mutagenesis_Backcross_Mutation_Caller.ga

Lines changed: 1 addition & 0 deletions
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)