Skip to content

Commit 1b99093

Browse files
committed
Merge branch 'dev'
2 parents aebbc26 + e29541c commit 1b99093

File tree

91 files changed

+15253
-1436
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

91 files changed

+15253
-1436
lines changed

.gitattributes

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,3 @@
11
# GitHub syntax highlighting
22
pixi.lock linguist-language=YAML
3+
llms.txt linguist-language=markdown linguist-detectable=true

.github/workflows/test.yml

Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
name: CI Tests
2+
3+
on:
4+
push:
5+
branches: [main, dev, experimental]
6+
pull_request:
7+
branches: [main, dev]
8+
workflow_dispatch:
9+
10+
jobs:
11+
python-tests:
12+
runs-on: ubuntu-latest
13+
strategy:
14+
matrix:
15+
python-version: ['3.10', '3.11', '3.12', '3.13']
16+
17+
steps:
18+
- name: Checkout repository
19+
uses: actions/checkout@v4
20+
21+
- name: Set up Python
22+
uses: actions/setup-python@v5
23+
with:
24+
python-version: ${{ matrix.python-version }}
25+
26+
- name: Install UV
27+
uses: astral-sh/setup-uv@v4
28+
with:
29+
enable-cache: true
30+
cache-dependency-glob: "pyproject.toml"
31+
32+
- name: Install dependencies with UV
33+
run: |
34+
uv sync --dev --frozen
35+
36+
- name: Run Python tests with pytest
37+
run: |
38+
uv run pytest bin/ -v --cov=bin --cov-report=xml --cov-report=term-missing
39+
40+
- name: Upload coverage reports
41+
uses: codecov/codecov-action@v4
42+
with:
43+
files: ./coverage.xml
44+
flags: python-${{ matrix.python-version }}
45+
name: Python ${{ matrix.python-version }}
46+
if: matrix.python-version == '3.12'
47+
48+
python-tests-tox:
49+
runs-on: ubuntu-latest
50+
51+
steps:
52+
- name: Checkout repository
53+
uses: actions/checkout@v4
54+
55+
- name: Set up Python
56+
uses: actions/setup-python@v5
57+
with:
58+
python-version: '3.12'
59+
60+
- name: Install UV
61+
uses: astral-sh/setup-uv@v4
62+
with:
63+
enable-cache: true
64+
cache-dependency-glob: "pyproject.toml"
65+
66+
- name: Install and run tox
67+
run: |
68+
uvx --from tox-uv tox -p auto
69+
70+
- name: Run linting with tox
71+
run: |
72+
uvx --from tox-uv tox -e lint
73+
74+
nextflow-tests:
75+
runs-on: ubuntu-latest
76+
strategy:
77+
matrix:
78+
nextflow_version: ['23.10.0', 'latest']
79+
80+
steps:
81+
- name: Checkout repository
82+
uses: actions/checkout@v4
83+
84+
- name: Set up Nextflow
85+
uses: nf-core/setup-nextflow@v2
86+
with:
87+
version: ${{ matrix.nextflow_version }}
88+
89+
- name: Set up nf-test
90+
uses: nf-core/setup-nf-test@v1
91+
92+
- name: Run nf-test
93+
run: |
94+
nf-test test --verbose --profile test,docker
95+
96+
- name: Upload test results
97+
if: always()
98+
uses: actions/upload-artifact@v4
99+
with:
100+
name: nf-test-results-${{ matrix.nextflow_version }}
101+
path: |
102+
.nf-test/
103+
tests/output/
104+
105+
- name: Clean up
106+
if: always()
107+
run: |
108+
rm -rf work/
109+
rm -rf .nextflow/
110+
rm -rf .nf-test/

.gitignore

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,9 @@
1919
!.gitattributes
2020
!main.nf
2121
!nextflow.config
22+
!nf-test.config
23+
!CLAUDE.md
24+
!llms.txt
2225

2326
# github action workflows
2427
!/.github
@@ -46,6 +49,8 @@
4649
# bin of executable scripts
4750
!/bin/
4851
!/bin/*.py
52+
!/bin/*.rs
53+
!/bin/*.ers
4954
!/bin/*.ts
5055
!/bin/*.js
5156
!/bin/*.R
@@ -54,6 +59,7 @@
5459
!/bin/*.lua
5560
!/bin/*.sh
5661
!/bin/*.awk
62+
!/bin/README.md
5763

5864
# groovy libraries
5965
!/lib
@@ -65,3 +71,26 @@
6571
!/docs/*.md
6672
!/docs/*.pdf
6773
!/docs/*.html.gz
74+
75+
# globus adapter
76+
!/globus
77+
!/globus/.gitignore
78+
!/globus/README.md
79+
!/globus/action_provider
80+
!/globus/action_provider/oneroof_action_provider.py
81+
!/globus/action_provider/requirements.txt
82+
!/globus/config
83+
!/globus/config/.env.template
84+
!/globus/config/*.json
85+
!/globus/flows
86+
!/globus/flows/*.json
87+
!/globus/scripts
88+
!/globus/scripts/*.sh
89+
!/globus/scripts/*.py
90+
91+
# nf-test
92+
!tests/
93+
!tests/**/*
94+
!tests/**/*.nf.test
95+
!tests/data/
96+
!tests/data/**/*

.pre-commit-config.yaml

Lines changed: 13 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,17 @@ repos:
77
- id: check-toml
88
- id: end-of-file-fixer
99
- id: trailing-whitespace
10-
- repo: https://github.com/astral-sh/ruff-pre-commit
11-
rev: "v0.9.6"
10+
# - repo: https://github.com/astral-sh/ruff-pre-commit
11+
# rev: "v0.9.6"
12+
# hooks:
13+
# - id: ruff
14+
# args: ["--fix"]
15+
# - id: ruff-format
16+
- repo: local
1217
hooks:
13-
- id: ruff
14-
args: ["--fix"]
15-
- id: ruff-format
18+
- id: no-env-files
19+
name: Block .env files
20+
entry: .env files must not be committed
21+
language: fail
22+
files: '\.env$'
23+
description: 'Prevents accidental commit of .env files containing sensitive configuration'

CLAUDE.md

Lines changed: 143 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,143 @@
1+
# CLAUDE.md
2+
3+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4+
5+
## Project Overview
6+
7+
OneRoof is a Nextflow-based bioinformatics pipeline for base-calling, variant-calling, and consensus-calling of amplicon sequencing data. It supports both Nanopore (pod5/BAM/FASTQ) and Illumina (paired-end FASTQ) data, with particular focus on SARS-CoV-2 and H5N1 influenza genomic surveillance.
8+
9+
## Key Commands
10+
11+
### Development Environment Setup
12+
```bash
13+
# For environments with conda dependencies (full pipeline)
14+
pixi install --frozen
15+
pixi shell --frozen
16+
17+
# For PyPI-only environments (Python development)
18+
uv venv
19+
source .venv/bin/activate # or .venv\Scripts\activate on Windows
20+
uv sync
21+
```
22+
23+
### Running the Pipeline
24+
```bash
25+
# Nanopore data from raw POD5s
26+
nextflow run . \
27+
--pod5_dir my_pod5_dir \
28+
--primer_bed my_primers.bed \
29+
--refseq my_ref.fasta \
30+
--ref_gbk my_ref.gbk \
31+
--kit "SQK-NBD114-24"
32+
33+
# Illumina data
34+
nextflow run . \
35+
--illumina_fastq_dir my_illumina_reads/ \
36+
--primer_bed my_primers.bed \
37+
--refseq my_ref.fasta \
38+
--ref_gbk my_ref.gbk
39+
40+
# Run without containers (requires pixi environment)
41+
nextflow run . -profile containerless [options]
42+
```
43+
44+
### Code Quality & Testing
45+
```bash
46+
# Python linting and formatting
47+
ruff check . --exit-zero --fix --unsafe-fixes
48+
ruff format .
49+
50+
# Run Python tests (using uv for speed)
51+
uv run pytest bin/test_*.py
52+
# Or run tests with tox for multiple environments
53+
tox
54+
55+
# Build documentation
56+
just docs
57+
58+
# IMPORTANT: Modifying README.md
59+
# The README.md in the project root is generated from docs/index.qmd
60+
# NEVER edit README.md directly - it will be overwritten
61+
# Always edit docs/index.qmd and re-render:
62+
just make-readme # or: just docs
63+
64+
# Docker operations
65+
just docker-build
66+
just docker-push
67+
```
68+
69+
## Architecture
70+
71+
### Directory Structure
72+
- `main.nf` - Main workflow entry point that orchestrates platform-specific workflows
73+
- `workflows/` - Platform-specific workflows (nanopore.nf, illumina.nf)
74+
- `subworkflows/` - Reusable workflow components (alignment, variant_calling, primer_handling, etc.)
75+
- `modules/` - Individual process definitions for tools (dorado, minimap2, ivar, etc.)
76+
- `bin/` - Python utility scripts with PEP 723 inline dependencies (fully portable with uv)
77+
- `conf/` - Configuration files for different platforms and tools
78+
79+
### Key Workflow Components
80+
81+
1. **Data Ingestion** - Handles multiple input formats (pod5, BAM, FASTQ) with optional remote file watching
82+
2. **Primer Handling** - Validates primers, trims reads, and ensures complete amplicons
83+
3. **Alignment & Variant Calling** - Platform-specific alignment and variant calling using minimap2 and ivar/bcftools
84+
4. **Quality Control** - FastQC, MultiQC, and custom coverage plotting
85+
5. **Consensus Generation** - Creates consensus sequences with configurable frequency thresholds
86+
6. **Optional Features** - Metagenomics (Sylph), phylogenetics (Nextclade), haplotyping (Devider)
87+
88+
### Technology Stack
89+
- **Workflow Engine**: Nextflow DSL2
90+
- **Container Support**: Docker, Singularity/Apptainer
91+
- **Environment Management**: Pixi (combines conda and PyPI dependencies), UV (fast Python package management)
92+
- **Languages**: Nextflow (Groovy), Python 3.10+
93+
- **Key Tools**: Dorado (basecalling), minimap2 (alignment), ivar/bcftools (variants), FastQC/MultiQC (QC)
94+
95+
### Configuration Philosophy
96+
- Parameters are primarily set via command line arguments
97+
- Platform-specific configs (nanopore.config, illumina.config) are auto-loaded based on input data type
98+
- Container profiles (docker, singularity, apptainer, containerless) control execution environment
99+
- Advanced users can modify nextflow.config for fine-tuning
100+
101+
### Important Parameters
102+
- `--pod5_batch_size`: Controls GPU memory usage during basecalling
103+
- `--min_variant_frequency`: Platform-specific defaults (0.05 for Illumina, 0.10 for Nanopore)
104+
- `--downsample_to`: Manages computational resources by limiting coverage depth
105+
- `--model`: Nanopore basecalling model (defaults to sup@latest)
106+
107+
## Dependency Management
108+
109+
### Python Package Management
110+
- **Always use `uv` instead of `pip`** for any Python package installation - it's significantly faster and more reliable
111+
- **Use `uv` for PyPI-only environments**: When working with Python scripts that only need PyPI dependencies
112+
- **Use `pixi` for mixed environments**: When conda dependencies are required (e.g., for the full pipeline)
113+
- **Script execution**: Always use `uv run` instead of `python3` to execute Python scripts
114+
```bash
115+
# Good - uses inline dependencies from PEP 723 headers
116+
uv run bin/some_script.py
117+
118+
# Avoid - doesn't guarantee dependencies
119+
python3 bin/some_script.py
120+
```
121+
- **Portable scripts**: All scripts in `bin/` include PEP 723 inline dependencies, making them fully portable with uv
122+
- **Benefits**: This approach eliminates dependency hell in Python by ensuring consistent, reproducible environments
123+
124+
### Testing Infrastructure
125+
- **Comprehensive test coverage**: Python scripts in `bin/` have extensive test coverage using pytest
126+
- **Test execution**: Tests can be run quickly with UV for PyPI-only environments
127+
```bash
128+
# Run all tests
129+
uv run pytest bin/test_*.py
130+
131+
# Run specific test
132+
uv run pytest bin/test_specific_module.py
133+
```
134+
- **CI/CD**: The continuous integration pipeline uses UV instead of pip for improved speed and reliability
135+
- **Test organization**: Test files follow the pattern `test_*.py` and are colocated with the scripts they test
136+
137+
## Development Notes
138+
139+
1. **Testing**: Python scripts have comprehensive test coverage; Nextflow workflow tests are planned for future implementation
140+
2. **GPU Requirements**: Nanopore basecalling requires CUDA-capable GPUs
141+
3. **Memory Management**: Use `--low_memory` flag for resource-constrained environments
142+
4. **Slack Integration**: Optional alerts can be configured for pipeline completion
143+
5. **Dependency Management**: Always use `uv` for Python operations to ensure fast, reliable dependency resolution

0 commit comments

Comments
 (0)