Skip to content

bwa index and mem potential path bugs #60

@vsoch

Description

@vsoch

Snakemake version
5.10.0

Describe the bug
For each of bwa index and bwa mem the user might put in a prefix that will work for a local job, but fail given a remote. Examples are provided below:

Bwa Mem

For bwa mem, note that the params has an "index" below that is used to specify the path. The first below I don't think will work:

rule map_reads:
    input:
        reads=get_merged,
        idx=rules.bwa_index.output
    output:
        temp("mapped/{sample}.sorted.bam")
    log:
        "logs/bwa_mem/{sample}.log"
    params:
        index="refs/genome",
        extra=get_read_group,
        sort="samtools",
        sort_order="coordinate"
    threads: 8
    wrapper:
        "0.39.0/bio/bwa/mem"

But this would:

rule map_reads:
    input:
        reads=get_merged,
        idx=rules.bwa_index.output
    output:
        temp("mapped/{sample}.sorted.bam")
    log:
        "logs/bwa_mem/{sample}.log"
    params:
        index="snakemake-testing/kim-wxs-varlociraptor/refs/genome",
        extra=get_read_group,
        sort="samtools",
        sort_order="coordinate"
    threads: 8
    wrapper:
        "0.39.0/bio/bwa/mem"

Unlike bwa index (discussed next) I'm not sure we can just remove this one.

Bwa Index

For this prefix, given running on a remote with this recipe:

rule bwa_index:
    input:
        "refs/genome.fasta"
    output:
        multiext("refs/genome", ".amb", ".ann", ".bwt", ".pac", ".sa")
    params:
        prefix="refs/genome"
    log:
        "logs/bwa_index.log"
    resources:
        mem_mb=6000,disk_mb=128000
    benchmark:
        "benchmarks/bwa_index.tsv"
    wrapper:
        "0.45.1/bio/bwa/index"

The error log will report that bwa/genome.pac cannot be found. It's not the inputs or outputs, but rather that prefix is used to determine the path to the file! The correct usage would be:

rule bwa_index:
    input:
        "refs/genome.fasta"
    output:
        multiext("refs/genome", ".amb", ".ann", ".bwt", ".pac", ".sa")
    params:
        prefix="snakemake-testing/kim-wxs-varlociraptor/refs/genome"
    log:
        "logs/bwa_index.log"
    resources:
        mem_mb=6000,disk_mb=128000
    benchmark:
        "benchmarks/bwa_index.tsv"
    wrapper:
        "0.45.1/bio/bwa/index"

But rather we aren't required to specify it, so even better would be to remove it entirely:

rule bwa_index:
    input:
        "refs/genome.fasta"
    output:
        multiext("refs/genome", ".amb", ".ann", ".bwt", ".pac", ".sa")
    log:
        "logs/bwa_index.log"
    resources:
        mem_mb=6000,disk_mb=128000
    benchmark:
        "benchmarks/bwa_index.tsv"
    wrapper:
        "0.45.1/bio/bwa/index"

Of course the user writing the pipeline might not know this, in which case maybe there should be a fix to allow for a prefix specified that would, given a default remote prefix, add it as well?

For both of the above, if I can get started on work to fix the wrappers and then do a PR here, I'd be happy to do that! I'm not sure if there are other cases like this too in the wrappers. Let me know your thoughts.

Metadata

Metadata

Assignees

No one assigned

    Labels

    StalebugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions