debugging foo

https://hackmd.io/SqmpNz40TMO9aRIF3G_j2g?both

# Debugging failing snakemake runs

When I run snakemake workflows "for realz", with many CPUs (a large `-j`) and/or across many machines, I frequently run into errors that are really hard to track down.

The first problem that I often encounter is that I can't figure out why the command failed. This is partly because a frustrating UNIX-ism: Snakemake outputs the error message _after_ the command fails, so you need to go look  _above_ the command to see the error output (this is something that is hard to change in UNIX). But the bigger problem is  that when running many commands at the same time, the output gets mixed together and it is difficult or impossible to figure out which output text and errors go with which command.

This connects with the problem that sometimes running multiple commands at the same time can cause problems. The most common such problem is _memory usage_ - when one command requires a large amount of RAM, it may fail itself or cause other commands running at the same time to fail.

The easiest way I've found to debug all of this is to do the following:

* run as many snakemake jobs as possible with `-k`;
* once that is complete, all the remaining TODO jobs are failing.  Now, run them one at a time, either manually (by specifying a particular output file) _or_ by limiting the number of threads you give - e.g. `-j 1`.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

debugging foo #39

Debugging failing snakemake runs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

debugging foo #39

Description

Debugging failing snakemake runs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions