v1.0.0 - stable version with complete docs
We're excited to announce the first stable release of fetch_ngs, a Snakemake workflow to fetch and process public sequencing data across all major genomics repositories!
Features
-
Data Acquisition using iSeq
- Download sequencing data from GSA, SRA, ENA, GEO, and DDBJ repositories
- Support for multiple accession ID types (BioProject, BioSample, Experiment, Run)
- Parallel downloading capabilities for improved performance
- Comprehensive metadata extraction for all datasets
- Metadata-only exploration mode to preview available data
-
Data Processing
- Automatic handling of both single-end and paired-end sequencing data
- Optional conversion from FASTQ (
.fastq.gz) to unmapped BAM (.bam) format - Creation of unified metadata files with accession IDs and file paths
Documentation
- Comprehensive configuration guide with examples for metadata-only, FASTQ, and BAM output workflows
- Detailed methods section template for scientific publications
- Directory structure documentation for result interpretation
- Usage recommendations for efficient workflow execution
- Integration examples with downstream analysis modules
MrBiomics Ecosystem
This workflow is part of the MrBiomics ecosystem, offering seamless integration with other modules for comprehensive end-to-end analysis pipelines. Showcased in the ATAC-seq and RNA-seq analysis recipes.
We invite you to explore, use, and contribute to this workflow. For questions, feedback, or contributions, please visit our GitHub repository.
Full Changelog: https://github.com/epigen/fetch_ngs/commits/v1.0.0