-
Notifications
You must be signed in to change notification settings - Fork 7
Home
Welcome to the one-workflow-many-ways wiki!
As the README says, the point of this project is to get an idea of how easy or hard it is for a beginner to implement a basic workflow in different workflow systems. With the exception of bash, I am a beginner at all of these.
I whipped up this script in approximately 10 minutes and then spent another 30 minutes making it nice. I consider this the 'baseline' by which other scripts are measured.
- bash file : https://github.com/morgantaschuk/one-workflow-many-ways/blob/master/bash/bamqc.sh
- results : https://travis-ci.org/morgantaschuk/one-workflow-many-ways/jobs/349431822
This was the first new workflow language I tried.
- wdl file: https://github.com/morgantaschuk/one-workflow-many-ways/blob/master/wdl/bamqc.wdl
- inputs: https://github.com/morgantaschuk/one-workflow-many-ways/blob/master/wdl/bamqc_inputs.json
- results: https://travis-ci.org/morgantaschuk/one-workflow-many-ways/jobs/349431820
Thoughts:
- The documentation is very good and all in one place. I had a practical working example very quickly
- The WDL file almost like bash.
- some escaping problems. basically nothing appreciates
samtools flagstat $BAM 2>&1 | perl -pe 's|(\d+ \+ \d+)\s+(.*)\R|"$2": "$1",|g' | sed 's/.$//' - was finicky about colliding names (can't have a global
bamqcvariable and abamqctask) - had it working pretty quickly
- some escaping problems. basically nothing appreciates
- Cromwell is a little bit verbose (but this is tunable)
- WOMtools lets you autogenerate the inputs.json file.
This was a totally different experience.
- cwl files: https://github.com/morgantaschuk/one-workflow-many-ways/blob/master/cwl/workflow.cwl
- input file: https://github.com/morgantaschuk/one-workflow-many-ways/blob/master/cwl/workflow.yml
- results: https://travis-ci.org/morgantaschuk/one-workflow-many-ways/jobs/349431821
Thoughts:
- No pipes??? NO PIPES???
- No you can have pipes but you need to have
requirements: class: ShellCommandRequirementand then{valueFrom: " | ", shellQuote: false}(see bamqc.cwl) - ..... ok
- No you can have pipes but you need to have
- Every step in a different cwl file, joined together with a workflow cwl
- documentation is disorganized and the 'user guide' doesn't actually show you anything useful for a really long time http://www.commonwl.org/user_guide/
- Creating workflows (with multiple steps... remember you can't pipe so each step is very small) is buried down in lesson 20
- Specifying inputs and outputs is a bit bonkers
- this is how you name output files:
stdout: bamqc_result.json
outputs:
outjson:
type: stdout
- yes that means the name is 'out of scope' of the actual outfile. For some reason.
- once I had the separate steps (I re-wrote flagstat2json.sh so that it could take a file as well as stdin) then creating the workflow was very simple 👍 for reuse
Interestingly, Toil one broke on the WDL file that worked on Cromwell (commit 7bac144)
task bamqc {
String samtools
File bamqc_pl
File bamfile
File bedfile
String outjson
String xtra_json
command {
eval '${samtools} view ${bamfile} | perl ${bamqc_pl} -r ${bedfile} -j "${xtra_json}" > ${outjson}'
}
output {
File out = "${outjson}"
}
It barfed with a overabundance of quotes:
File "/media/mtaschuk/Data/git/one-workflow-many-ways/toilwdl_compiled.py", line 122
eval ''''
^
SyntaxError: EOL while scanning string literal
Traceback (most recent call last):
File "/media/mtaschuk/Data/git/one-workflow-many-ways/venv/bin/toil-wdl-runner", line 11, in <module>
sys.exit(main())
File "/media/mtaschuk/Data/git/one-workflow-many-ways/venv/local/lib/python2.7/site-packages/toil/wdl/toilwdl.py", line 2312, in main
subprocess.check_call(cmd)
File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['python', '/media/mtaschuk/Data/git/one-workflow-many-ways/toilwdl_compiled.py']' returned non-zero exit status 1
In the python script that Toil makes toilwdl_compiled.py, the task turned into the following block (indentation preserved):
command9 = '''
eval ''''
command10 = samtools
command11 = ''' view '''
command12 = bamfile_fs
command13 = ''' | perl '''
command14 = bamqc_pl_fs
command15 = ''' -r '''
command16 = bedfile_fs
command17 = ''' -j "'''
command18 = xtra_json
command19 = '''" > '''
command20 = outjson
command21 = ''''
'''
So it looks like there's a bug, which I will eventually figure out where to file. In the meantime I'm going to remove the eval statement and the 'single quotes' since that seems to be the issue.
Edit: Looks like Toil also doesn't like constants in WDL files. I set the output filename to "flagstat.json" in the flagstat task and it complains about it too.
Traceback (most recent call last):
File "/media/mtaschuk/Data/git/one-workflow-many-ways/toilwdl_compiled.py", line 203, in <module>
job1 = Job.wrapJobFn(flagstat, samtools=SAMTOOLS, flagstat_to_json=flagstat_to_json, bamfile=BAMFILE, outfile=outfile)
NameError: name 'outfile' is not defined
Traceback (most recent call last):
File "/media/mtaschuk/Data/git/one-workflow-many-ways/venv/bin/toil-wdl-runner", line 11, in <module>
sys.exit(main())
File "/media/mtaschuk/Data/git/one-workflow-many-ways/venv/local/lib/python2.7/site-packages/toil/wdl/toilwdl.py", line 2312, in main
subprocess.check_call(cmd)
File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['python', '/media/mtaschuk/Data/git/one-workflow-many-ways/toilwdl_compiled.py']' returned non-zero exit status 1
I also note that I needed to clean up my local working directory before I could try again: toil.jobStores.abstractJobStore.JobStoreExistsException: The job store '/media/mtaschuk/Data/git/one-workflow-many-ways/toilWorkflowRun' already exists. Use --restart to resume the workflow, or remove the job store with 'toil clean' to start the workflow from scratch. Which, cool.
Giving up on Toil WDL for the moment because it requires too many changes to the WDL I made for Cromwell.