Skip to content

OlivierCoen/nf-genome-assembler

Repository files navigation

nf-genome-assembler

GitHub Actions CI Status GitHub Actions Linting StatusCite with Zenodo nf-test

Nextflow run with conda run with docker run with singularity Launch on Seqera Platform

Introduction

nf-genome-assembler is a bioinformatics pipeline that is designed to assemble genomes from long-read sequencing data and Hi-C data. It is built using Nextflow, a workflow management system that allows for the creation of reproducible and scalable pipelines.

Installation

Please check the installation instructions for more details on how to install Nextflow and Docker / Apptainer.

Running the pipeline

First, prepare a samplesheet with your input data that looks as follows:

samplesheet.yaml:

- name: my_assembly
  platform: nanopore
  reads: /path/to/ont_reads.fastq.gz
  hic_fastq_1: /path/to/hic_read_r1.fastq.gz
  hic_fastq_2: /path/to/hic_read_r2.fastq.gz
  genome_size: 1000000000
  assembly: /path/to/assembly

It can also be a CSV samplesheet:

name,platform,reads,hic_fastq_1,hic_fastq_2,genome_size,assembly
my_assembly,nanopore,/path/to/ont_reads.fastq.gz,/path/to/hic_read_r1.fastq.gz,/path/to/hic_read_r2.fastq.gz,1000000000,/path/to/assembly

Note

The assembly column is also optional and serves only when you want to skip early steps and continue with a specific assembly. The genome_size column is optional and serves only for Flye to estimate the expected genome size.

Now, you can run the pipeline using:

nextflow run OlivierCoen/nf-genome-assembler \
   -latest \
   -profile <docker/apptainer/conda/.../institute> \
   --input samplesheet.csv \
   --outdir <OUTDIR>
   -resume

Warning

Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.

Credits

nf-genome-assembler was originally written by Olivier Coen.

We thank the following people for their extensive assistance in the development of this pipeline:

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines.

Citations

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

This pipeline uses code and infrastructure developed and maintained by the nf-core community, reused here under the MIT license.

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors