Skip to content

v1.0.0 - stable version with complete docs

Choose a tag to compare

@sreichl sreichl released this 11 Mar 10:58
· 11 commits to main since this release

We're excited to announce the first stable release of fetch_ngs, a Snakemake workflow to fetch and process public sequencing data across all major genomics repositories!

Features

  • Data Acquisition using iSeq

    • Download sequencing data from GSA, SRA, ENA, GEO, and DDBJ repositories
    • Support for multiple accession ID types (BioProject, BioSample, Experiment, Run)
    • Parallel downloading capabilities for improved performance
    • Comprehensive metadata extraction for all datasets
    • Metadata-only exploration mode to preview available data
  • Data Processing

    • Automatic handling of both single-end and paired-end sequencing data
    • Optional conversion from FASTQ (.fastq.gz) to unmapped BAM (.bam) format
    • Creation of unified metadata files with accession IDs and file paths

Documentation

  • Comprehensive configuration guide with examples for metadata-only, FASTQ, and BAM output workflows
  • Detailed methods section template for scientific publications
  • Directory structure documentation for result interpretation
  • Usage recommendations for efficient workflow execution
  • Integration examples with downstream analysis modules

MrBiomics Ecosystem

This workflow is part of the MrBiomics ecosystem, offering seamless integration with other modules for comprehensive end-to-end analysis pipelines. Showcased in the ATAC-seq and RNA-seq analysis recipes.

We invite you to explore, use, and contribute to this workflow. For questions, feedback, or contributions, please visit our GitHub repository.

Full Changelog: https://github.com/epigen/fetch_ngs/commits/v1.0.0