Merge new zsh-dependent config revamp into master#82
Conversation
Limit snakemake version to < 8.0
correct interpretation of basename in _get_fq_paths
add bam indexing to filter rules
correct indexing in bam_filter rules
protect against relative paths to `ln -s` using `readlink`
…yaml, set shell executable to zsh in PATH
There was a problem hiding this comment.
Pull request overview
Revamps the pipeline configuration to be zsh-based and to use hierarchical YAML configuration for per-sample/per-chemistry/per-platform settings.
Changes:
- Switches the workflow shell to
zshand adds hierarchical config lookup viaSAMPLES+chemistry.yaml. - Refactors cutadapt/STAR rules to pull parameters from config instead of TSV/JSON-driven logic.
- Removes legacy sample/chemistry artifacts (
sample_fastqs.tsv,chemistry.json, generator script) and updates docs/developer guide.
Reviewed changes
Copilot reviewed 13 out of 14 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| scraps_conda.yml | Updates environment deps (STAR constraint loosening; adds zsh). |
| sample_fastqs.tsv | Removes legacy sample sheet. |
| Snakefile | Switches to zsh and adds hierarchical _get_config() + new rule-all output assembly. |
| config.yaml | Introduces DEFAULTS + SAMPLES structure replacing TSV-driven sample config. |
| chemistry.yaml | Adds new hierarchical chemistry/platform parameter definitions. |
| rules/cutadapt_star.snake | Refactors trimming/alignment params to come from _get_config()/chemistry.yaml. |
| rules/count.snake | Updates inputs to match new alignment outputs and adds pre-featureCounts filtering. |
| rules/qc.snake | Adjusts QC inputs to use SAMPLES keys. |
| inst/scripts/cut_paste_fastq.py | Adds optional length-based trimming support for stitched FASTQ reconstruction. |
| README.md | Updates setup/config documentation to match new YAML-based configuration. |
| AGENTS.md | Adds developer guide and conventions for working in this repo. |
| chemistry_to_json.py | Removes obsolete JSON generator. |
| chemistry.json | Removes obsolete chemistry JSON. |
Comments suppressed due to low confidence (1)
rules/count.snake:45
assign_sites_R1writes an intermediate filtered BAM to{params.temp}but never removes it. This will leave large files behind (unlikeassign_sites_paired, which cleans up). Add cleanup for{params.temp}once it's no longer needed (and consider declaring it as atemp()output instead of aparamspath).
samtools sort \
{params.out_bam} \
-o {output.bam}
samtools index {output.bam}
rm -rf {params.out_bam}
"""
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def paste_fastq(file_in, file_out, length1): | ||
| print(length1) | ||
| with open(file_in) as file, gzip.open(file_out, 'wt', compresslevel = 1) as file2: |
There was a problem hiding this comment.
There are leftover debug prints (print(length1)) in both paste_fastq and main(). This will add noisy stdout in Snakemake logs and can interfere with callers that expect the script to be quiet on success. Remove these prints or gate them behind an explicit --verbose flag.
rules/cutadapt_star.snake
Outdated
| if [ -z {params.bc_cut} ] ; then | ||
| echo "no additional trimming" | ||
| cutadapt -j 24 \ |
There was a problem hiding this comment.
In the paired trimming branch, [ -z {params.bc_cut} ] is unquoted. When bc_cut is empty this expands to [ with a missing argument, causing a shell error; when it contains special chars it can also undergo word-splitting. Quote the substitution (e.g. "{params.bc_cut}") or otherwise ensure a safe test expression.
There was a problem hiding this comment.
@copilot open a new pull request to apply changes based on this feedback
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: agillen <4809242+agillen@users.noreply.github.com>
Fix unquoted shell parameter substitutions in cutadapt_paired rule
Add step to create Conda environment before running Snakemake.
Removed the step to create the Conda environment using mamba.
Activate the 'scraps_conda' environment before running Snakemake.
No description provided.