Skip to content

debugging foo #39

@ctb

Description

@ctb

https://hackmd.io/SqmpNz40TMO9aRIF3G_j2g?both

Debugging failing snakemake runs

When I run snakemake workflows "for realz", with many CPUs (a large -j) and/or across many machines, I frequently run into errors that are really hard to track down.

The first problem that I often encounter is that I can't figure out why the command failed. This is partly because a frustrating UNIX-ism: Snakemake outputs the error message after the command fails, so you need to go look above the command to see the error output (this is something that is hard to change in UNIX). But the bigger problem is that when running many commands at the same time, the output gets mixed together and it is difficult or impossible to figure out which output text and errors go with which command.

This connects with the problem that sometimes running multiple commands at the same time can cause problems. The most common such problem is memory usage - when one command requires a large amount of RAM, it may fail itself or cause other commands running at the same time to fail.

The easiest way I've found to debug all of this is to do the following:

  • run as many snakemake jobs as possible with -k;
  • once that is complete, all the remaining TODO jobs are failing. Now, run them one at a time, either manually (by specifying a particular output file) or by limiting the number of threads you give - e.g. -j 1.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions