Skip to content

Commit 7871746

Browse files
author
Toby Hodges
committed
added metadata and authorship episode
1 parent 2db80ca commit 7871746

File tree

5 files changed

+429
-0
lines changed

5 files changed

+429
-0
lines changed

_episodes/16-metadata.md

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
---
2+
title: "Metadata and Authorship"
3+
teaching: 10
4+
exercises: 0
5+
questions:
6+
- "How do I provide information for people to cite my tool descriptions?"
7+
objectives:
8+
- "Learn how to add authorship information and other metadata to a CWL
9+
description."
10+
keypoints:
11+
- "Metadata can be provided in CWL descriptions."
12+
- "Developers should provide a minimal amount of authorship information to
13+
encourage correct citation."
14+
---
15+
Implementation extensions not required for correct execution (for example,
16+
fields related to GUI presentation) and metadata about the tool or workflow
17+
itself (for example, authorship for use in citations) may be provided as
18+
additional fields on any object.
19+
Such extensions fields must use a namespace prefix listed in the `$namespaces`
20+
section of the document as described in the
21+
[Schema Salad specification][schema-salad].
22+
23+
For all developers, we recommend the following minimal metadata for your tool
24+
and workflows. This example includes metadata allowing others to cite your tool.
25+
26+
*metadata_example2.cwl*
27+
28+
```
29+
{% include cwl/metadata_example2.cwl %}
30+
```
31+
32+
#### Extended Example
33+
34+
For those that are highly motivated, it is also possible to annotate your tool
35+
with a much larger amount of metadata. This example includes EDAM ontology tags
36+
as keywords (allowing the grouping of related tools), hints at hardware
37+
requirements in order to use the tool, and a few more metadata fields.
38+
39+
*metadata_example3.cwl*
40+
41+
```
42+
{% include cwl/metadata_example3.cwl %}
43+
```
44+
45+
[schema-salad]: http://www.commonwl.org/v1.0/SchemaSalad.html#Explicit_context

_episodes/17-1st-workflow.md

Lines changed: 133 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,133 @@
1+
---
2+
title: "Writing Workflows"
3+
teaching: 10
4+
exercises: 0
5+
questions:
6+
- "How do I connect tools together into a workflow?"
7+
objectives:
8+
- "Learn how to construct workflows from multiple CWL tool descriptions."
9+
keypoints:
10+
- "Each step in a workflow must have its own CWL description."
11+
- "Top level inputs and outputs of the workflow are described in the `inputs`
12+
and `outputs` fields respectively."
13+
- "The steps are specified under `steps`."
14+
- "Execution order is determined by the flow of inputs and outputs between
15+
steps."
16+
---
17+
This workflow extracts a java source file from a tar file and then
18+
compiles it.
19+
20+
*1st-workflow.cwl*
21+
22+
```
23+
{% include cwl/1st-workflow.cwl %}
24+
```
25+
26+
Use a JSON object in a separate file to describe the input of a run:
27+
28+
*1st-workflow-job.yml*
29+
30+
```
31+
{% include cwl/1st-workflow-job.yml %}
32+
```
33+
34+
Now invoke `cwl-runner` with the tool wrapper and the input object on the
35+
command line:
36+
37+
```
38+
$ echo "public class Hello {}" > Hello.java && tar -cvf hello.tar Hello.java
39+
$ cwl-runner 1st-workflow.cwl 1st-workflow-job.yml
40+
[job untar] /tmp/tmp94qFiM$ tar xf /home/example/hello.tar Hello.java
41+
[step untar] completion status is success
42+
[job compile] /tmp/tmpu1iaKL$ docker run -i --volume=/tmp/tmp94qFiM/Hello.java:/var/lib/cwl/job301600808_tmp94qFiM/Hello.java:ro --volume=/tmp/tmpu1iaKL:/var/spool/cwl:rw --volume=/tmp/tmpfZnNdR:/tmp:rw --workdir=/var/spool/cwl --read-only=true --net=none --user=1001 --rm --env=TMPDIR=/tmp java:7 javac -d /var/spool/cwl /var/lib/cwl/job301600808_tmp94qFiM/Hello.java
43+
[step compile] completion status is success
44+
[workflow 1st-workflow.cwl] outdir is /home/example
45+
Final process status is success
46+
{
47+
"classout": {
48+
"location": "/home/example/Hello.class",
49+
"checksum": "sha1$e68df795c0686e9aa1a1195536bd900f5f417b18",
50+
"class": "File",
51+
"size": 416
52+
}
53+
}
54+
```
55+
56+
What's going on here? Let's break it down:
57+
58+
```
59+
cwlVersion: v1.0
60+
class: Workflow
61+
```
62+
63+
The `cwlVersion` field indicates the version of the CWL spec used by the
64+
document. The `class` field indicates this document describes a workflow.
65+
66+
67+
```
68+
inputs:
69+
inp: File
70+
ex: string
71+
```
72+
73+
The `inputs` section describes the inputs of the workflow. This is a
74+
list of input parameters where each parameter consists of an identifier
75+
and a data type. These parameters can be used as sources for input to
76+
specific workflows steps.
77+
78+
```
79+
outputs:
80+
classout:
81+
type: File
82+
outputSource: compile/classfile
83+
```
84+
85+
The `outputs` section describes the outputs of the workflow. This is a
86+
list of output parameters where each parameter consists of an identifier
87+
and a data type. The `outputSource` connects the output parameter `classfile`
88+
of the `compile` step to the workflow output parameter `classout`.
89+
90+
```
91+
steps:
92+
untar:
93+
run: tar-param.cwl
94+
in:
95+
tarfile: inp
96+
extractfile: ex
97+
outputs: [example_out]
98+
```
99+
100+
The `steps` section describes the actual steps of the workflow. In this
101+
example, the first step extracts a file from a tar file, and the second
102+
step compiles the file from the first step using the java compiler.
103+
Workflow steps are not necessarily run in the order they are listed,
104+
instead the order is determined by the dependencies between steps (using
105+
`source`). In addition, workflow steps which do not depend on one
106+
another may run in parallel.
107+
108+
The first step, `untar` runs `tar-param.cwl` (described previously in
109+
[Parameter references][params]). This tool has two input parameters, `tarfile`
110+
and `extractfile` and one output parameter `example_out`.
111+
112+
The `inputs` section of the workflow step connects these two input parameters to
113+
the inputs of the workflow, `inp` and `ex` using `source`. This means that when
114+
the workflow step is executed, the values assigned to `inp` and `ex` will be
115+
used for the parameters `tarfile` and `extractfile` in order to run the tool.
116+
117+
The `outputs` section of the workflow step lists the output parameters that are
118+
expected from the tool.
119+
120+
```
121+
compile:
122+
run: arguments.cwl
123+
in:
124+
src: untar/example_out
125+
outputs: [classfile]
126+
```
127+
128+
The second step `compile` depends on the results from the first step by
129+
connecting the input parameter `src` to the output parameter of `untar` using
130+
`untar/example_out`. The output of this step `classfile` is connected to the
131+
`outputs` section for the Workflow, described above.
132+
133+
[params]: _episodes/06-params/

_episodes/18-nested-workflows.md

Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,123 @@
1+
---
2+
title: "Nested Workflows"
3+
teaching: 10
4+
exercises: 0
5+
questions:
6+
- "How do I connect multiple workflows together?"
7+
objectives:
8+
- "Learn how to construct nested workflows from multiple CWL workflow
9+
descriptions."
10+
keypoints:
11+
- "A workflow can be used as a step in another workflow, if the workflow engine
12+
supports the `SubworkflowFeatureRequirement`."
13+
- "The workflows are specified under `steps`, with the worklow's description
14+
file provided as the value to the `run` field."
15+
- "Use `default` to specify a default value for a field, which can be
16+
overwritten by a value in the input object."
17+
- "Use `>` to ignore newlines in long commands split over multiple lines."
18+
---
19+
Workflows are ways to combine multiple tools to perform a larger operations.
20+
We can also think of a workflow as being a tool itself; a CWL workflow can be
21+
used as a step in another CWL workflow, if the workflow engine supports the
22+
`SubworkflowFeatureRequirement`:
23+
24+
25+
```
26+
requirements:
27+
- class: SubworkflowFeatureRequirement
28+
```
29+
30+
Here's an example workflow that uses our `1st-workflow.cwl` as a nested
31+
workflow:
32+
33+
```
34+
{% include cwl/nestedworkflows.cwl %}
35+
```
36+
37+
A CWL `Workflow` can be used as a `step` just like a `CommandLineTool`, it's CWL
38+
file is included with `run`. The workflow inputs (`inp` and `ex`) and outputs
39+
(`classout`) then can be mapped to become the step's input/outputs.
40+
41+
```
42+
compile:
43+
run: 1st-workflow.cwl
44+
in:
45+
inp:
46+
source: create-tar/tar
47+
ex:
48+
default: "Hello.java"
49+
out: [classout]
50+
```
51+
52+
Our `1st-workflow.cwl` was parameterized with workflow inputs, so when running
53+
it we had to provide a job file to denote the tar file and `*.java` filename.
54+
This is generally best-practice, as it means it can be reused in multiple parent
55+
workflows, or even in multiple steps within the same workflow.
56+
57+
Here we use `default:` to hard-code `"Hello.java"` as the `ex` input, however
58+
our workflow also requires a tar file at `inp`, which we will prepare in the
59+
`create-tar` step. At this point it is probably a good idea to refactor
60+
`1st-workflow.cwl` to have more specific input/output names, as those also
61+
appear in its usage as a tool.
62+
63+
It is also possible to do a less generic approach and avoid external
64+
dependencies in the job file. So in this workflow we can generate a hard-coded
65+
`Hello.java` file using the previously mentioned `InitialWorkDirRequirement`
66+
requirement, before adding it to a tar file.
67+
68+
```
69+
create-tar:
70+
requirements:
71+
- class: InitialWorkDirRequirement
72+
listing:
73+
- entryname: Hello.java
74+
entry: |
75+
public class Hello {
76+
public static void main(String[] argv) {
77+
System.out.println("Hello from Java");
78+
}
79+
}
80+
```
81+
82+
In this case our step can assume `Hello.java` rather than be parameterized, so
83+
we can use a simpler `arguments` form as long as the CWL workflow engine
84+
supports the `ShellCommandRequirement`:
85+
86+
```
87+
run:
88+
class: CommandLineTool
89+
requirements:
90+
- class: ShellCommandRequirement
91+
arguments:
92+
- shellQuote: false
93+
valueFrom: >
94+
tar cf hello.tar Hello.java
95+
```
96+
97+
Note the use of `shellQuote: false` here, otherwise the shell will try to
98+
execute the quoted binary `"tar cf hello.tar Hello.java"`.
99+
100+
Here the `>` block means that newlines are stripped, so it's possible to write
101+
the single command on multiple lines. Similarly, the `|` we used above will
102+
preserve newlines, combined with `ShellCommandRequirement` this would allow
103+
embedding a shell script.
104+
Shell commands should however be used sparingly in CWL, as it means you
105+
"jump out" of the workflow and no longer get reusable components, provenance or
106+
scalability. For reproducibility and portability it is recommended to only use
107+
shell commands together with a `DockerRequirement` hint, so that the commands
108+
are executed in a predictable shell environment.
109+
110+
Did you notice that we didn't split out the `tar cf` tool to a separate file,
111+
but rather embedded it within the CWL Workflow file? This is generally not best
112+
practice, as the tool then can't be reused. The reason for doing it in this case
113+
is because the command line is hard-coded with filenames that only make sense
114+
within this workflow.
115+
116+
In this example we had to prepare a tar file outside, but only because our inner
117+
workflow was designed to take that as an input. A better refactoring of the
118+
inner workflow would be to take a list of Java files to compile, which would
119+
simplify its usage as a tool step in other workflows.
120+
121+
Nested workflows can be a powerful feature to generate higher-level functional
122+
and reusable workflow units - but just like for creating a CWL Tool description,
123+
care must be taken to improve its usability in multiple workflows.

_includes/cwl/metadata2.cwl

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
#!/usr/bin/env cwl-runner
2+
3+
class: CommandLineTool
4+
id: Example tool
5+
label: Example tool
6+
cwlVersion: v1.0
7+
doc: |
8+
An example tool demonstrating metadata. Note that this is an example and the metadata is not necessarily consistent.
9+
10+
requirements:
11+
- class: ShellCommandRequirement
12+
13+
inputs:
14+
bam_input:
15+
type: File
16+
doc: The BAM file used as input
17+
format: http://edamontology.org/format_2572
18+
inputBinding:
19+
position: 1
20+
21+
stdout: output.txt
22+
23+
outputs:
24+
report:
25+
type: File
26+
format: http://edamontology.org/format_1964
27+
outputBinding:
28+
glob: "*.txt"
29+
doc: A text file that contains a line count
30+
31+
baseCommand: ["wc", "-l"]
32+
33+
$namespaces:
34+
s: https://schema.org/
35+
36+
$schemas:
37+
- http://dublincore.org/2012/06/14/dcterms.rdf
38+
- http://xmlns.com/foaf/spec/20140114.rdf
39+
- https://schema.org/docs/schema_org_rdfa.html
40+
41+
s:author:
42+
- class: s:Person
43+
s:id: https://orcid.org/0000-0002-6130-1021
44+
45+
s:name: Denis Yuen
46+
47+
s:contributor:
48+
- class: s:Person
49+
s:id: http://orcid.org/0000-0002-7681-6415
50+
51+
s:name: Brian O'Connor
52+
- class: s:Person
53+
s:id: https://orcid.org/0000-0002-6130-1021
54+
s:email: dyuen@oicr.on.ca
55+
s:name: Denis Yuen
56+
57+
58+
s:citation: https://figshare.com/articles/Common_Workflow_Language_draft_3/3115156/2
59+
s:codeRepository: https://github.com/common-workflow-language/common-workflow-language
60+
s:dateCreated: "2016-12-13"
61+
s:license: https://www.apache.org/licenses/LICENSE-2.0

0 commit comments

Comments
 (0)