Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions workflow/Snakefile
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ if "variant-calls" in config:

include: "rules/compare-vcfs.smk"
include: "rules/eval-results.smk"
include: "rules/annotation.smk"

rule all:
input:
Expand All @@ -30,11 +31,11 @@ if "variant-calls" in config:
),
get_fp_fn_reports,
expand(
"results/fp-fn/vcf/{benchmark}/{benchmark}.shared_fn.sorted.vcf.gz",
"results/fp-fn/annotated_vcf/{benchmark}/{benchmark}.shared_fn.annotated.vcf.gz",
benchmark=used_benchmarks,
),
expand(
"results/fp-fn/vcf/{benchmark_callset[0]}/{benchmark_callset[1]}.unique_{classification}.sorted.vcf.gz",
"results/fp-fn/annotated_vcf/{benchmark_callset[0]}/{benchmark_callset[1]}.unique_{classification}.annotated.vcf.gz",
benchmark_callset=used_benchmarks_callsets,
classification=["fp", "fn"],
),
Expand Down
5 changes: 5 additions & 0 deletions workflow/envs/curl.yaml
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not just use the tools.yaml here instead?

Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
channels:
- conda-forge
- nodefaults
dependencies:
- curl =8.17.0
6 changes: 6 additions & 0 deletions workflow/envs/htslib.yaml
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, bcftools usually has htslib installed aswell then

Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
channels:
- conda-forge
- bioconda
dependencies:
- htslib =1.12
- unzip =6.0
125 changes: 125 additions & 0 deletions workflow/rules/annotation.smk
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
rule get_vep_cache:
output:
directory("resources/vep/cache"),
params:
species="homo_sapiens",
build=get_reference_genome_build(),
release="111",
log:
"logs/vep/cache.log",
cache: "omit-software"
wrapper:
"v8.0.2/bio/vep/cache"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can update tot the latest here (same version as others in workflow)

Suggested change
"v8.0.2/bio/vep/cache"
"v8.1.1/bio/vep/cache"



rule get_vep_plugins:
output:
directory("resources/vep/plugins"),
params:
release="111",
log:
"logs/vep/plugins.log",
wrapper:
"v8.0.2/bio/vep/plugins"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"v8.0.2/bio/vep/plugins"
"v8.1.1/bio/vep/plugins"



rule download_revel:
output:
temp("resources/revel_scores.zip"),
log:
"logs/vep_plugins/download_revel.log",
conda:
"../envs/curl.yaml"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"../envs/curl.yaml"
"../envs/tools.yaml"

shell:
"curl https://zenodo.org/records/7072866/files/revel-v1.3_all_chromosomes.zip -o {output} &> {log}"
Comment on lines +26 to +34
Copy link
Contributor

@coderabbitai coderabbitai bot Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add -L flag to curl for redirect handling.

The Zenodo URL may redirect. Adding -L ensures curl follows redirects. Also consider adding --fail to exit with error on HTTP failures.

🔎 Proposed fix
     shell:
-        "curl https://zenodo.org/records/7072866/files/revel-v1.3_all_chromosomes.zip -o {output} &> {log}"
+        "curl -L --fail https://zenodo.org/records/7072866/files/revel-v1.3_all_chromosomes.zip -o {output} &> {log}"
🤖 Prompt for AI Agents
In workflow/rules/annotation.smk around lines 26 to 34, the curl command used to
download the Zenodo file doesn't follow HTTP redirects or fail on HTTP errors;
update the shell command to include the -L flag to follow redirects and add
--fail (and optionally --show-error) so curl exits with a failure status on HTTP
errors and prints errors to the log; keep the redirection to the rule log
intact.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need an extra curl env? can we not use
conda:
"../envs/tools.yaml"

as in
rule get_liftover_track:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check the contents of both environment files
echo "=== tools.yaml ==="
cat workflow/envs/tools.yaml
echo ""
echo "=== curl.yaml ==="
cat workflow/envs/curl.yaml
echo ""
echo "=== get_liftover_track rule ==="
rg -A 15 "rule get_liftover_track" workflow/rules/

Length of output: 1210


@famosab You're absolutely right! Looking at the code, tools.yaml already contains curl =8, and the get_liftover_track rule in workflow/rules/download.smk successfully uses tools.yaml for curl downloads.

The separate curl.yaml environment appears unnecessary. The only difference is that curl.yaml pins to version 8.17.0 while tools.yaml uses the major version 8. For consistency with the existing pattern (like get_liftover_track), you could use:

conda:
    "../envs/tools.yaml"

This would eliminate the redundant environment file unless there's a specific requirement for curl version 8.17.0 in this rule.


🧠 Learnings used
Learnt from: BiancaStoecker
Repo: snakemake-workflows/dna-seq-benchmark PR: 149
File: workflow/Snakefile:36-40
Timestamp: 2025-11-11T08:47:27.118Z
Learning: In the dna-seq-benchmark workflow, VCF files are sorted by a generic sort_vcf rule in workflow/rules/utils.smk that transforms {prefix}.vcf.gz to {prefix}.sorted.vcf.gz, so rules that produce VCF outputs don't include .sorted in their filenames - the sorting is handled as a separate downstream step.



rule process_revel_scores:
input:
"resources/revel_scores.zip",
output:
"resources/revel_scores.tsv.gz",
params:
build=get_reference_genome_build(),
log:
"logs/vep_plugins/process_revel_scores.log",
conda:
"../envs/htslib.yaml"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"../envs/htslib.yaml"
"../envs/tools.yaml"

shell:
"""
tmpfile=$(mktemp {resources.tmpdir}/revel_scores.XXXXXX)
unzip -p {input} | tr "," "\t" | sed '1s/.*/#&/' | bgzip -c > $tmpfile
if [ "{params.build}" == "GRCh38" ] ; then
zgrep -h -v ^#chr $tmpfile | awk '$3 != "." ' | sort -k1,1 -k3,3n - | cat <(zcat $tmpfile | head -n1) - | bgzip -c > {output}
elif [ "{params.build}" == "GRCh37" ] ; then
cat $tmpfile > {output}
else
echo "Annotation of REVEL scores only supported for GRCh37 or GRCh38" > {log}
exit 125
fi
"""
Comment on lines +37 to +60
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Missing resources declaration for tmpdir and temp file cleanup.

The shell script references {resources.tmpdir} but no resources block is declared. This will cause a runtime error. Also, the temporary file $tmpfile is not removed after use.

🔎 Proposed fix
 rule process_revel_scores:
     input:
         "resources/revel_scores.zip",
     output:
         "resources/revel_scores.tsv.gz",
     params:
         build=get_reference_genome_build(),
     log:
         "logs/vep_plugins/process_revel_scores.log",
+    resources:
+        tmpdir=temp("tmpdir"),
     conda:
         "../envs/htslib.yaml"
     shell:
         """
-        tmpfile=$(mktemp {resources.tmpdir}/revel_scores.XXXXXX)
+        tmpfile=$(mktemp "${{TMPDIR:-/tmp}}"/revel_scores.XXXXXX)
+        trap "rm -f $tmpfile" EXIT
         unzip -p {input} | tr "," "\t" | sed '1s/.*/#&/' | bgzip -c > $tmpfile
         if [ "{params.build}" == "GRCh38" ] ; then
             zgrep -h -v ^#chr $tmpfile | awk '$3 != "." ' | sort -k1,1 -k3,3n - | cat <(zcat $tmpfile | head -n1) - | bgzip -c > {output}
         elif [ "{params.build}" == "GRCh37" ] ; then
             cat $tmpfile > {output}
         else
             echo "Annotation of REVEL scores only supported for GRCh37 or GRCh38" > {log}
             exit 125
         fi
         """
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
rule process_revel_scores:
input:
"resources/revel_scores.zip",
output:
"resources/revel_scores.tsv.gz",
params:
build=get_reference_genome_build(),
log:
"logs/vep_plugins/process_revel_scores.log",
conda:
"../envs/htslib.yaml"
shell:
"""
tmpfile=$(mktemp {resources.tmpdir}/revel_scores.XXXXXX)
unzip -p {input} | tr "," "\t" | sed '1s/.*/#&/' | bgzip -c > $tmpfile
if [ "{params.build}" == "GRCh38" ] ; then
zgrep -h -v ^#chr $tmpfile | awk '$3 != "." ' | sort -k1,1 -k3,3n - | cat <(zcat $tmpfile | head -n1) - | bgzip -c > {output}
elif [ "{params.build}" == "GRCh37" ] ; then
cat $tmpfile > {output}
else
echo "Annotation of REVEL scores only supported for GRCh37 or GRCh38" > {log}
exit 125
fi
"""
rule process_revel_scores:
input:
"resources/revel_scores.zip",
output:
"resources/revel_scores.tsv.gz",
params:
build=get_reference_genome_build(),
log:
"logs/vep_plugins/process_revel_scores.log",
resources:
tmpdir=temp("tmpdir"),
conda:
"../envs/htslib.yaml"
shell:
"""
tmpfile=$(mktemp "${{TMPDIR:-/tmp}}"/revel_scores.XXXXXX)
trap "rm -f $tmpfile" EXIT
unzip -p {input} | tr "," "\t" | sed '1s/.*/#&/' | bgzip -c > $tmpfile
if [ "{params.build}" == "GRCh38" ] ; then
zgrep -h -v ^#chr $tmpfile | awk '$3 != "." ' | sort -k1,1 -k3,3n - | cat <(zcat $tmpfile | head -n1) - | bgzip -c > {output}
elif [ "{params.build}" == "GRCh37" ] ; then
cat $tmpfile > {output}
else
echo "Annotation of REVEL scores only supported for GRCh37 or GRCh38" > {log}
exit 125
fi
"""
🤖 Prompt for AI Agents
In workflow/rules/annotation.smk around lines 37 to 60, the rule
process_revel_scores uses {resources.tmpdir} but no resources block is declared
and the temporary file $tmpfile is never cleaned up; add a resources:
tmpdir=<appropriate_tmpdir_path> declaration to the rule (or use resources:
tmpdir=configured_tmpdir variable) so Snakemake provides the directory, and
modify the shell block to create the temp file inside that tmpdir and remove it
on exit (use a trap to rm -f $tmpfile on EXIT or ensure rm -f $tmpfile at the
end of the script) so the temp file is always cleaned up even on errors.



rule tabix_revel_scores:
input:
"resources/revel_scores.tsv.gz",
output:
"resources/revel_scores.tsv.gz.tbi",
params:
get_tabix_revel_params(),
log:
"logs/tabix/revel.log",
wrapper:
"v8.0.2/bio/tabix/index"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"v8.0.2/bio/tabix/index"
"v8.1.1/bio/tabix/index"



rule annotate_shared_fn:
input:
calls="results/fp-fn/vcf/{benchmark}/{benchmark}.shared_fn.sorted.vcf.gz",
cache=access.random("resources/vep/cache"),
plugins=access.random("resources/vep/plugins"),
revel=lambda wc: get_plugin_aux("REVEL"),
revel_tbi=lambda wc: get_plugin_aux("REVEL", True),
fasta=access.random("resources/reference/genome.fasta"),
fai="resources/reference/genome.fasta.fai",
output:
calls="results/fp-fn/annotated_vcf/{benchmark}/{benchmark}.shared_fn.annotated.vcf.gz",
stats="results/fp-fn/annotated_vcf/{benchmark}/{benchmark}.shared_fn.stats.html",
params:
# Pass a list of plugins to use, see https://www.ensembl.org/info/docs/tools/vep/script/vep_plugins.html
# Plugin args can be added as well, e.g. via an entry "MyPlugin,1,FOO", see docs.
plugins=["REVEL"],
extra="--everything --check_existing --vcf_info_field ANN --hgvsg --sift b --polyphen b ",
log:
"logs/vep/fp-fn/{benchmark}/{benchmark}.shared_fn.annotate.log",
threads: 4
group:
"annotation"
wrapper:
"v8.0.2/bio/vep/annotate"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"v8.0.2/bio/vep/annotate"
"v8.1.1/bio/vep/annotate"



rule annotate_unique_fp_fn:
input:
calls="results/fp-fn/vcf/{benchmark}/{callset}.unique_{classification}.sorted.vcf.gz",
cache=access.random("resources/vep/cache"),
plugins=access.random("resources/vep/plugins"),
revel=lambda wc: get_plugin_aux("REVEL"),
revel_tbi=lambda wc: get_plugin_aux("REVEL", True),
fasta=access.random("resources/reference/genome.fasta"),
fai="resources/reference/genome.fasta.fai",
output:
calls="results/fp-fn/annotated_vcf/{benchmark}/{callset}.unique_{classification}.annotated.vcf.gz",
stats="results/fp-fn/annotated_vcf/{benchmark}/{callset}.unique_{classification}.stats.html",
params:
# Pass a list of plugins to use, see https://www.ensembl.org/info/docs/tools/vep/script/vep_plugins.html
# Plugin args can be added as well, e.g. via an entry "MyPlugin,1,FOO", see docs.
plugins=["REVEL"],
extra="--everything --check_existing --vcf_info_field ANN --hgvsg --sift b --polyphen b ",
log:
"logs/vep/fp-fn/{benchmark}/{callset}.unique_{classification}.annotate.log",
threads: 4
group:
"annotation"
wrapper:
"v8.0.2/bio/vep/annotate"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"v8.0.2/bio/vep/annotate"
"v8.1.1/bio/vep/annotate"

13 changes: 13 additions & 0 deletions workflow/rules/common.smk
Original file line number Diff line number Diff line change
Expand Up @@ -733,3 +733,16 @@ if "variant-calls" in config:
classification="fp|fn",
comparison="genotype|existence",
vartype="snvs|indels",


def get_tabix_revel_params():
# Indexing of REVEL-score file where the column depends on the reference
column = 2 if config["reference-genome"] == "grch37" else 3
return f"-f -s 1 -b {column} -e {column}"


def get_plugin_aux(plugin, index=False):
if plugin == "REVEL":
suffix = ".tbi" if index else ""
return "resources/revel_scores.tsv.gz{suffix}".format(suffix=suffix)
return []
Loading