-
Notifications
You must be signed in to change notification settings - Fork 204
Description
Snakemake version
5.10.0
Describe the bug
For each of bwa index and bwa mem the user might put in a prefix that will work for a local job, but fail given a remote. Examples are provided below:
Bwa Mem
For bwa mem, note that the params has an "index" below that is used to specify the path. The first below I don't think will work:
rule map_reads:
input:
reads=get_merged,
idx=rules.bwa_index.output
output:
temp("mapped/{sample}.sorted.bam")
log:
"logs/bwa_mem/{sample}.log"
params:
index="refs/genome",
extra=get_read_group,
sort="samtools",
sort_order="coordinate"
threads: 8
wrapper:
"0.39.0/bio/bwa/mem"
But this would:
rule map_reads:
input:
reads=get_merged,
idx=rules.bwa_index.output
output:
temp("mapped/{sample}.sorted.bam")
log:
"logs/bwa_mem/{sample}.log"
params:
index="snakemake-testing/kim-wxs-varlociraptor/refs/genome",
extra=get_read_group,
sort="samtools",
sort_order="coordinate"
threads: 8
wrapper:
"0.39.0/bio/bwa/mem"
Unlike bwa index (discussed next) I'm not sure we can just remove this one.
Bwa Index
For this prefix, given running on a remote with this recipe:
rule bwa_index:
input:
"refs/genome.fasta"
output:
multiext("refs/genome", ".amb", ".ann", ".bwt", ".pac", ".sa")
params:
prefix="refs/genome"
log:
"logs/bwa_index.log"
resources:
mem_mb=6000,disk_mb=128000
benchmark:
"benchmarks/bwa_index.tsv"
wrapper:
"0.45.1/bio/bwa/index"
The error log will report that bwa/genome.pac cannot be found. It's not the inputs or outputs, but rather that prefix is used to determine the path to the file! The correct usage would be:
rule bwa_index:
input:
"refs/genome.fasta"
output:
multiext("refs/genome", ".amb", ".ann", ".bwt", ".pac", ".sa")
params:
prefix="snakemake-testing/kim-wxs-varlociraptor/refs/genome"
log:
"logs/bwa_index.log"
resources:
mem_mb=6000,disk_mb=128000
benchmark:
"benchmarks/bwa_index.tsv"
wrapper:
"0.45.1/bio/bwa/index"
But rather we aren't required to specify it, so even better would be to remove it entirely:
rule bwa_index:
input:
"refs/genome.fasta"
output:
multiext("refs/genome", ".amb", ".ann", ".bwt", ".pac", ".sa")
log:
"logs/bwa_index.log"
resources:
mem_mb=6000,disk_mb=128000
benchmark:
"benchmarks/bwa_index.tsv"
wrapper:
"0.45.1/bio/bwa/index"
Of course the user writing the pipeline might not know this, in which case maybe there should be a fix to allow for a prefix specified that would, given a default remote prefix, add it as well?
For both of the above, if I can get started on work to fix the wrappers and then do a PR here, I'd be happy to do that! I'm not sure if there are other cases like this too in the wrappers. Let me know your thoughts.