nf-genome-assembler is a bioinformatics pipeline that is designed to assemble genomes from long-read sequencing data and Hi-C data. It is built using Nextflow, a workflow management system that allows for the creation of reproducible and scalable pipelines.
Please check the installation instructions for more details on how to install Nextflow and Docker / Apptainer.
First, prepare a samplesheet with your input data that looks as follows:
samplesheet.yaml:
- name: my_assembly
platform: nanopore
reads: /path/to/ont_reads.fastq.gz
hic_fastq_1: /path/to/hic_read_r1.fastq.gz
hic_fastq_2: /path/to/hic_read_r2.fastq.gz
genome_size: 1000000000
assembly: /path/to/assemblyIt can also be a CSV samplesheet:
name,platform,reads,hic_fastq_1,hic_fastq_2,genome_size,assembly
my_assembly,nanopore,/path/to/ont_reads.fastq.gz,/path/to/hic_read_r1.fastq.gz,/path/to/hic_read_r2.fastq.gz,1000000000,/path/to/assemblyNote
The assembly column is also optional and serves only when you want to skip early steps and continue with a specific assembly.
The genome_size column is optional and serves only for Flye to estimate the expected genome size.
Now, you can run the pipeline using:
nextflow run OlivierCoen/nf-genome-assembler \
-latest \
-profile <docker/apptainer/conda/.../institute> \
--input samplesheet.csv \
--outdir <OUTDIR>
-resumeWarning
Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.
nf-genome-assembler was originally written by Olivier Coen.
We thank the following people for their extensive assistance in the development of this pipeline:
If you would like to contribute to this pipeline, please see the contributing guidelines.
An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.
This pipeline uses code and infrastructure developed and maintained by the nf-core community, reused here under the MIT license.
The nf-core framework for community-curated bioinformatics pipelines.
Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.