Skip to content

Reproducible and portable workflows! #121

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: gh-pages
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions _extras/recommended-practices.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,12 @@ permalink: /rec-practices/

Below are a set of recommended good practices to keep in mind when writing a Common Workflow Language description for a tool or workflow. These guidelines are presented for consideration on a scale of usefulness: more is better, not all are required.

☐ Reproducibility and Portability are essential goals of scientific workflow developers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence no verb :-)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am arguably a native speaker and I beg to differ. Not only is it a complete and grammatically correct sentence, but it does too have a verb.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps I was too harsh, my apologies. The point being, this is supposed to be a list of actions: things to do.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mr-c oh not at all! It did send me on a trip down memory lane to English language class. I vaguely recall that there can be sentences without verbs. I suspect it was "Yes." and "No."

I was entertained by https://english.stackexchange.com/questions/258/shortest-comprehensive-sentence-in-english

Ok, back to work. Many thanks for reviewing! I will address your comments


- The best way to ensure portability and reproducibility is to rigidly specify the exact environment a tool should run in. Currently a linux image (commonly called a `Docker image`), packaging the exact environment intended by the developer, is the best way to distribute a tool executable. Use `DockerPull` to specify the image. Use an image identifier that is resilient to updates to the container.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs line wrapping
Language is too strong, a software container doesn't capture the kernel version nor the CPU type, which can effect reproducibility. It is merely the most reasonable thing we can do today 🙂

Also, containers are often constructed differently or contrary to the software developes's intentions (if any) or in ways they hadn't even considered, so I'd drop that.

Ideally, a container is configured to operate in an unsurprising and as correct as possible manner for the majority of users. Lacking that, it should match the workflow author's needs.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mr-c thanks for taking a look! I've made some changes.

- If this is not possible, carefully specifying software tools and dependencies using `SoftwareRequirement` is the next best resort. Be aware that changes in the tool repositories the tools are being pulled from may silently change the behavior of the tool at each run.
- Not specifying a docker image or software requirements will result in a non-reproducible, non-portable workflow!

☐ No `type: string` parameters for names of input or reference files/directories; use `type: File` or `type: Directory` as appropriate.

☐ Include a license that allows for re-use by anyone, e.g. [Apache 2.0][apache-license]. If possible, the license should be specified with its corresponding [SPDX identifier][spdx]. Construct the metadata field for the licence by providing a URL of the form `https://spdx.org/licenses/[SPDX-ID]` where `SPDX-ID` is the taken from the list of identifiers linked above. See the example snippet below for guidance. For non-standard licenses without an SPDX identifier, provide a URL to the license.
Expand Down