|
| 1 | +--- |
| 2 | +title: "Mutation Mapping Pipiline for *C. elegans* EMS mutagenesis and Backcross Experiments" |
| 3 | +author: "Richard J. Acton" |
| 4 | +date: "`r Sys.Date()`" |
| 5 | +output: |
| 6 | + html_document: |
| 7 | + fig_caption: yes |
| 8 | + number_sections: yes |
| 9 | + toc: yes |
| 10 | + df_print: paged |
| 11 | +vignette: > |
| 12 | + %\VignetteIndexEntry{Generating-input} |
| 13 | + %\VignetteEngine{knitr::rmarkdown} |
| 14 | + %\VignetteEncoding{UTF-8} |
| 15 | +bibliography: "assets/bib.bib" |
| 16 | +csl: "assets/genomebiology.csl" |
| 17 | +link-citations: yes |
| 18 | +linkcolor: blue |
| 19 | +--- |
| 20 | + |
| 21 | +```{r, include = FALSE} |
| 22 | +knitr::opts_chunk$set( |
| 23 | + collapse = TRUE, |
| 24 | + comment = "#>" |
| 25 | +) |
| 26 | +``` |
| 27 | +# Quick Start |
| 28 | + |
| 29 | +- __Inputs:__ |
| 30 | + - paired-end fastq files to a galaxy [@Afgan2016; @Jalili2020] history as `list of dataset pairs` |
| 31 | + - A suitable genome fasta file (*C. elegans*, ce11.fa.gz - Compatible with WBcel235.86 used by SnpEff) |
| 32 | +- Run the Pipeline: https://usegalaxy.eu/u/richardjacton/w/c-elegans-ems-mutagenesis-mutation-caller |
| 33 | +- __Outputs:__ |
| 34 | + - [`MultiQC`](https://multiqc.info/) HTML report with QC info on the input fastqs, trimming, mapping, and deduplication steps. |
| 35 | + - `.vcf` file with variants from all samples (FreeBayes mutation caller) |
| 36 | + - `.vcf` file with variants from all samples (MiModD mutation caller) |
| 37 | + - `.gff` file with deletions from all samples (MiModD deletion calling tool) |
| 38 | +- Perform Quality filtering and appropriate set subtractions with [`MutantSets`](https://github.com/RichardJActon/MutantSets) or alternatively the `MiModD VCF Filter` or `SnpSift Filter` tools to identify candidate variants. |
| 39 | +- (optionally) `MiModD NacreousMap` for visualisation of mutation locations and `MiModD Report Variants` for HTML mutation list |
| 40 | + |
| 41 | +NB samples are expected to be of the form 'A123_0001_S1_R1_L001.fq.gz', sample Identifiers are extracted from this with a regular expression: `\w+_(\d+)_S\d+_L\d+.*`. This would yield the sample identifier of: 0001. If your file does not conform to this pattern you may need to update this regex by editing the rules in the 'apply rule to collection' step of the workflow. |
| 42 | + |
| 43 | +# Background |
| 44 | + |
| 45 | +Doitsidou et al. reviewed Sequencing-Based Approaches for Mutation Mapping and Identification in *C. elegans* [@Doitsidou2016]. They describe three main approaches to mapping by sequencing: |
| 46 | + |
| 47 | +1. Hawaiian variant mapping |
| 48 | +2. EMS-density mapping |
| 49 | +3. Variant discovery mapping |
| 50 | + |
| 51 | +This pipeline is currently only compatible with 2 of them, EMS-density mapping & Variant discovery mapping (VDM). |
| 52 | + |
| 53 | + |
| 54 | + |
| 55 | + |
| 56 | + |
| 57 | +## Research Need |
| 58 | + |
| 59 | +The Schumacher lab identified a need for an analysis pipeline to map and identify mutations in Ethyl methanesulfonate (EMS) mutagenesis forward genetic screens. |
| 60 | + |
| 61 | +Previously a tool called `CloudMap` [@Minevich2012] had been used for this purpose on a Galaxy server. |
| 62 | +`CloundMap` is no longer under active development and has been deprecated from [Galaxy Europe](https://usegalaxy.eu/) and replaced by `MiModD` [Docs](https://mimodd.readthedocs.io/en/doc0.1.8/index.html) |
| 63 | + |
| 64 | +## Choice of Tools |
| 65 | + |
| 66 | +In a comparison of *C. elegans* mutation calling pipelines Smith et al. [@Smith2017a] indicated that they had good results with the `FreeBayes` [@Garrison2012]. |
| 67 | +So I have initially included this tool here in addition to the`MiModD` mutation caller to evaluate their relative performance. |
| 68 | +They also found the the `BBMap` aligner yielded better results however this is not available in Galaxy so I have opted for `Bowtie2` for expediency. |
| 69 | + |
| 70 | +# Pipeline Summary |
| 71 | + |
| 72 | +[Pipeline File (Local)](assets/Galaxy-Workflow-EMS_Mutagenesis_Backcross_Mutation_Caller.ga) |
| 73 | + |
| 74 | +https://usegalaxy.eu/u/richardjacton/w/c-elegans-ems-mutagenesis-mutation-caller |
| 75 | + |
| 76 | +- Adapter and Quality Trimming with [`fastp`](https://github.com/OpenGene/fastp) [@Chen2018] |
| 77 | +- Alignment with [`bowtie2 --sensitive-local`](https://github.com/BenLangmead/bowtie2) [@Langmead2012] |
| 78 | +- [`samtools view`](https://github.com/samtools/samtools) requiring that reads are mapped in a proper pair [@Li2009b] |
| 79 | +- Removal of PCR duplicates with [`Picard MarkDuplicates`](https://broadinstitute.github.io/picard/command-line-overview.html#MarkDuplicates) [@BroadInstitute2019] |
| 80 | +- Left alignment of indels in the BAM files using [`FreeBayes`](https://github.com/ekg/freebayes) [@Garrison2012] |
| 81 | +- [`MultiQC`](https://github.com/ewels/MultiQC) aggregating quality metrics from trimming, deduplication and alignment [@Ewels2016] |
| 82 | +- Variant calling with [`FreeBayes`](https://github.com/ekg/freebayes) [@Garrison2012], [`MiModD`](https://mimodd.readthedocs.io/) [@Baumeister2013] variant caller and deletion caller |
| 83 | +- SNP effect annotation with [`SnfEff eff`](http://snpeff.sourceforge.net/SnpEff.html) [@Cingolani2012] |
| 84 | +- SNP type annotation with [`SnpSift Variant Type`](http://snpeff.sourceforge.net/SnpSift.html) [@Cingolani2012] |
| 85 | + |
| 86 | +# Instructions (Step-by-Step) |
| 87 | + |
| 88 | +__1. Upload Data to galaxy__ |
| 89 | + |
| 90 | + |
| 91 | + |
| 92 | +__2. Select all fastq files and create a paired list__ |
| 93 | + |
| 94 | + |
| 95 | + |
| 96 | + |
| 97 | + |
| 98 | +__3. Pair the fastq files__ |
| 99 | + |
| 100 | + |
| 101 | + |
| 102 | +__4. Import the [workflow](https://usegalaxy.eu/u/richardjacton/w/c-elegans-ems-mutagenesis-mutation-caller)__ |
| 103 | + |
| 104 | + |
| 105 | + |
| 106 | +__5. Run the workflow__ |
| 107 | + |
| 108 | +Select the paired list object and a genome sequence file as inputs |
| 109 | + |
| 110 | + |
| 111 | + |
| 112 | +__6. Check Quality Control Information__ |
| 113 | + |
| 114 | +Inspect the `MultiQC` output for signs of technical problems with your data. |
| 115 | +Consult with your friendly local bioinformatician if there are QC issues you can't diagnose. |
| 116 | + |
| 117 | + |
| 118 | + |
| 119 | +__7. Preliminary quality filtering `SnpSift filter`__ |
| 120 | + |
| 121 | +Locate the [`SnpSift filter`](https://pcingola.github.io/SnpEff/SnpSift.html#filter) tool in the galaxy tools panel and apply some initial quality filters, simply `( QUAL > 15)` or `20` is probably sufficient. |
| 122 | +Starting with a low stringency filter and applying more stringent criteria when inspecting your candidate mutations it is probably advisable to avoid throwing out possible mutations. |
| 123 | +Some initial filtering is advisable as the full-sized VCF files may be too large to be easily read by the candidate mutant inspection tool in the next steps. |
| 124 | +You can check how many lines are in your VCF files by selecting them in the Galaxy history. |
| 125 | + |
| 126 | + |
| 127 | + |
| 128 | +__8. Download Data__ |
| 129 | + |
| 130 | +The main `FreeBayes` VCF file: |
| 131 | + |
| 132 | + |
| 133 | + |
| 134 | +The `MiModD` deletion calls: |
| 135 | + |
| 136 | + |
| 137 | + |
| 138 | +__9. Load the results in the `MutantSets` Shiny App to identify candidate mutations__ |
| 139 | + |
| 140 | +If running the App locally, install the [`R`](https://www.r-project.org/) package from: https://github.com/RichardJActon/MutantSets |
| 141 | + |
| 142 | +R package installation and running the app locally: |
| 143 | + |
| 144 | +``` |
| 145 | +# install.packages("remotes") # If you don't already have remotes/devtools |
| 146 | +# remotes::install_github("knausb/vcfR") # If vcfR fails to install from CRAN |
| 147 | +remotes::install_github("RichardJActon/MutantSets") |
| 148 | +MutantSets::launchApp() # opens the app in a web browser |
| 149 | +``` |
| 150 | + |
| 151 | +- Load the VCF and (optionally) the gff deletion mutant files into `MutantSets` |
| 152 | +- (Optionally) Name your samples something easier to understand |
| 153 | +- Use the genotype filters to subtract the appropriate sets |
| 154 | +- Tweak quality and allele frequency thresholds to get a small set of high quality candidates |
| 155 | +- Assess the candidate mutations by clicking on them and looking at their predicted effects and genomic locations |
| 156 | +- Download your top results as a `.tsv` file (openable in excel) |
| 157 | + |
| 158 | +__You should now have some candidate mutants to screen - Good Luck!__ |
| 159 | + |
| 160 | +# Feedback |
| 161 | + |
| 162 | +Please direct bug reports, feature requests, and questions to the maintainer of the mutant sets package via [github issues](https://github.com/RichardJActon/MutantSets/issues. |
| 163 | + |
| 164 | +# References |
| 165 | + |
| 166 | +```{r, echo=FALSE, include=FALSE, eval=FALSE} |
| 167 | +getCitations::getCitations( |
| 168 | + normalizePath("C-elegans_Backcross_mutation_calling_Galaxy_Workflow.Rmd"), |
| 169 | + normalizePath("assets/bib.bib"), |
| 170 | + "~/Documents/bibtex/library.bib" |
| 171 | +) |
| 172 | +``` |
| 173 | + |
0 commit comments