Nanopore variant calling pipeline

scbirlab/nf-ont-call-variants is a Nextflow pipeline to call variants from Nanopore FASTQ files from bacterial clones relative to a wildtype control.

The pipeline broadly recapitualtes, where possible, the GATK best practices for germline short variant calling, with some changes for bacterial genomes and long-read sequencing.

Table of contents

Processing steps
Requirements
Quick start
Inputs
Outputs
Issues, problems, suggestions
Further help

Processing steps

For each sample:

Quality Trim reads using cutadapt.
Map to genome FASTA using minimap2.
Call variants with Clair3.

Then merge resulting GVCFs using GATK CombineGVCFs. With the combined variant calls:

Annotate variant effects using snpEff.
Filter out variants where all samples have identical variants (important to have a wild-type control here).
Write to output TSV.

Other steps

Get FASTQ quality metrics with fastqc.
Generate alignment statistics and plots with samtools stats and mosdepth.
Map to genome FASTA using bowtie2 because minimap2 logs are not compatible with multiqc. This way, some kind of alignment metrics are possible.
Compile the logs of processing steps into an HTML report with multiqc.

Requirements

Software

You need to have Nextflow and either Conda, Singularity, or Docker installed on your system.

First time using Nextflow?

If you're at the Crick or your shared cluster has it already installed, try:

module load Nextflow Singularity

Otherwise, if it's your first time using Nextflow on your system and you have Conda installed, you can install it using conda:

conda install -c bioconda nextflow

You may need to set the NXF_HOME environment variable. For example,

mkdir -p ~/.nextflow
export NXF_HOME=~/.nextflow

To make this a permanent change, you can do something like the following:

mkdir -p ~/.nextflow
echo "export NXF_HOME=~/.nextflow" >> ~/.bash_profile
source ~/.bash_profile

Quick start

Make a sample sheet (see below) and, optionally, a nextflow.config file in the directory where you want the pipeline to run. Then run Nextflow.

nextflow run scbirlab/nf-ont-call-variants

Each time you run the pipeline after the first time, Nextflow will use a locally-cached version which will not be automatically updated. If you want to ensure that you're using the very latest version of the pipeline, use the -latest flag.

nextflow run scbirlab/nf-ont-call-variants -latest

If you want to run a particular tagged version of the pipeline, such as v0.0.2, you can do so using

nextflow run scbirlab/nf-ont-call-variants -r v0.0.2

For help, use nextflow run scbirlab/nf-ont-call-variants --help.

The first time you run the pipeline for a project, the software dependencies in environment.yml will be installed. This may take several minutes.

Inputs

The following parameters are required:

sample_sheet: path to a CSV with information about the samples and FASTQ files to be processed

The following parameters have default values which can be overridden if necessary.

inputs = "inputs" : The folder containing your inputs (i.e. sequencing reads). It's likely you'll want to change this one.
trim_qual = 10 : For cutadapt, the minimum Phred score for trimming 3' calls
min_length = 10 : For cutadapt, the minimum trimmed length of a read. Shorter reads will be discarded

The following options do not need to be changed, but can be overridden if you decide you need to:

gatk_image = "docker://broadinstitute/gatk:latest" : Which GATK4 image to use
snpeff_url = "https://snpeff.blob.core.windows.net/versions/snpEff_latest_core.zip" : Where to download snpEff from
clair3_image = "docker://hkubal/clair3:latest" : Which Clair3 image to use
rerio_url = "https://github.com/nanoporetech/rerio.git": Where to find the Rerio repository
clair3_model = "r1041_e82_400bps_sup_v500": Which basecalling model to use with Clair3

The parameters can be provided either in the nextflow.config file or on the nextflow run command.

Here is an example of the nextflow.config file:

params {
    sample_sheet = "/path/to/sample-sheet.csv"
    inputs = "/path/to/inputs"
}

Alternatively, you can provide the parameters on the command line:

nextflow run scbirlab/nf-ont-call-variants \
    --sample_sheet /path/to/sample-sheet.csv \
    --inputs /path/to/inputs

Sample sheet

The sample sheet is a CSV file providing information about which FASTQ files belong to which sample.

The file must have a header with the column names below, and one line per sample to be processed.

sample_id: the unique name of the sample. The wildtype must be named so that it is alphabetically last
reads: path (relative to inputs option above) to compressed FASTQ files derived from Nanopore sequencing
genome_accession: NCBI genome accession number of the reference, starting with "GCF_" or "GCA_". You can look it up here.

You can also add additional columns for annotation, e.g. strain_name, if you like for later ease of reference.

Here is an example of the sample sheet:

sample_id	reads	genome_accession
wt	raw_reads_wt_*.fastq.gz	GCF_000015005.1
mut1	raw_reads_mut_*.fastq.gz	GCF_000015005.1

Outputs

Outputs are saved in the same directory as sample_sheet. They are organised under three directories:

processed: FASTQ files and logs resulting from alignments
tables: tables, plots, and VCF files corresponding to variant calls
multiqc: HTML report on processing steps

Issues, problems, suggestions

If you run into problems not covered here, add to the issue tracker.

Further help

Here are the help pages of the software used by this pipeline.

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
.github/workflows		.github/workflows
test		test
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
main.nf		main.nf
nextflow.config		nextflow.config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Nanopore variant calling pipeline

Processing steps

Other steps

Requirements

Software

First time using Nextflow?

Quick start

Inputs

Sample sheet

Outputs

Issues, problems, suggestions

Further help

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Nanopore variant calling pipeline

Processing steps

Other steps

Requirements

Software

First time using Nextflow?

Quick start

Inputs

Sample sheet

Outputs

Issues, problems, suggestions

Further help

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages