Skip to content

Commit a8ed05e

Browse files
committed
finished workflow tutorial and reworked some of the docs structure
1 parent 1a6b067 commit a8ed05e

File tree

14 files changed

+567
-180
lines changed

14 files changed

+567
-180
lines changed
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
div.nbinput .prompt,
2+
div.nboutput .prompt {
3+
display: none;
4+
}

new-docs/source/conf.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -144,6 +144,9 @@
144144
},
145145
}
146146

147+
html_static_path = ["_static"]
148+
html_css_files = ["css/custom.css"]
149+
147150
# Add any paths that contain custom themes here, relative to this directory.
148151
# html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]
149152

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
Conditionals and lazy fields
2-
============================
1+
Conditional construction
2+
========================
33

44
Work in progress...
Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
2+
Design philosophy
3+
=================
4+
5+
Rationale
6+
---------
7+
8+
Scientific workflows often require sophisticated analyses that encompass a large collection
9+
of algorithms.
10+
The algorithms, that were originally not necessarily designed to work together,
11+
and were written by different authors.
12+
Some may be written in Python, while others might require calling external programs.
13+
It is a common practice to create semi-manual workflows that require the scientists
14+
to handle the files and interact with partial results from algorithms and external tools.
15+
This approach is conceptually simple and easy to implement, but the resulting workflow
16+
is often time consuming, error-prone and difficult to share with others.
17+
Consistency, reproducibility and scalability demand scientific workflows
18+
to be organized into fully automated pipelines.
19+
This was the motivation behind Pydra - a new dataflow engine written in Python.
20+
21+
History
22+
-------
23+
24+
The Pydra package is a part of the second generation of the Nipype_ ecosystem
25+
--- an open-source framework that provides a uniform interface to existing neuroimaging
26+
software and facilitates interaction between different software components.
27+
The Nipype project was born in the neuroimaging community, and has been helping scientists
28+
build workflows for a decade, providing a uniform interface to such neuroimaging packages
29+
as FSL_, ANTs_, AFNI_, FreeSurfer_ and SPM_.
30+
This flexibility has made it an ideal basis for popular preprocessing tools,
31+
such as fMRIPrep_ and C-PAC_.
32+
The second generation of Nipype ecosystem is meant to provide additional flexibility
33+
and is being developed with reproducibility, ease of use, and scalability in mind.
34+
Pydra itself is a standalone project and is designed as a general-purpose dataflow engine
35+
to support any scientific domain.
36+
37+
Goals
38+
-----
39+
40+
The goal of Pydra is to provide a lightweight dataflow engine for computational graph construction,
41+
manipulation, and distributed execution, as well as ensuring reproducibility of scientific pipelines.
42+
In Pydra, a dataflow is represented as a directed acyclic graph, where each node represents a Python
43+
function, execution of an external tool, or another reusable dataflow.
44+
The combination of several key features makes Pydra a customizable and powerful dataflow engine:
45+
46+
- Composable dataflows: Any node of a dataflow graph can be another dataflow, allowing for nested
47+
dataflows of arbitrary depths and encouraging creating reusable dataflows.
48+
49+
- Flexible semantics for creating nested loops over input sets: Any Task or dataflow can be run
50+
over input parameter sets and the outputs can be recombined (similar concept to Map-Reduce_ model,
51+
but Pydra extends this to graphs with nested dataflows).
52+
53+
- A content-addressable global cache: Hash values are computed for each graph and each Task.
54+
This supports reusing of previously computed and stored dataflows and Tasks.
55+
56+
- Support for Python functions and external (shell) commands: Pydra can decorate and use existing
57+
functions in Python libraries alongside external command line tools, allowing easy integration
58+
of existing code and software.
59+
60+
- Native container execution support: Any dataflow or Task can be executed in an associated container
61+
(via Docker or Singularity) enabling greater consistency for reproducibility.
62+
63+
- Auditing and provenance tracking: Pydra provides a simple JSON-LD-based message passing mechanism
64+
to capture the dataflow execution activities as a provenance graph. These messages track inputs
65+
and outputs of each task in a dataflow, and the resources consumed by the task.
66+
67+
.. _Nipype: https://nipype.readthedocs.io/en/latest/
68+
.. _FSL: https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/FSL
69+
.. _ANTs: http://stnava.github.io/ANTs/
70+
.. _AFNI: https://afni.nimh.nih.gov/
71+
.. _FreeSurfer: https://surfer.nmr.mgh.harvard.edu/
72+
.. _SPM: https://www.fil.ion.ucl.ac.uk/spm/
73+
.. _fMRIPrep: https://fmriprep.org/en/stable/
74+
.. _C-PAC: https://fcp-indi.github.io/docs/latest/index
75+
.. _Map-Reduce: https://en.wikipedia.org/wiki/MapReduce
Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
Hashing and caching
2-
===================
1+
Caching
2+
=======
33

44
Work in progress....

new-docs/source/howto/install.ipynb

Lines changed: 0 additions & 93 deletions
This file was deleted.
Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Real-world example"
8+
]
9+
},
10+
{
11+
"cell_type": "markdown",
12+
"metadata": {},
13+
"source": [
14+
"This is an real-world example of a workflow to pre-process T1-weighted MRI images for further analysis\n",
15+
"\n",
16+
"Work in progress..."
17+
]
18+
},
19+
{
20+
"cell_type": "markdown",
21+
"metadata": {},
22+
"source": []
23+
}
24+
],
25+
"metadata": {
26+
"kernelspec": {
27+
"display_name": "wf12",
28+
"language": "python",
29+
"name": "python3"
30+
},
31+
"language_info": {
32+
"codemirror_mode": {
33+
"name": "ipython",
34+
"version": 3
35+
},
36+
"file_extension": ".py",
37+
"mimetype": "text/x-python",
38+
"name": "python",
39+
"nbconvert_exporter": "python",
40+
"pygments_lexer": "ipython3",
41+
"version": "3.12.5"
42+
}
43+
},
44+
"nbformat": 4,
45+
"nbformat_minor": 2
46+
}

new-docs/source/index.rst

Lines changed: 18 additions & 71 deletions
Original file line numberDiff line numberDiff line change
@@ -3,86 +3,32 @@
33
Pydra
44
=====
55

6-
Pydra is a new lightweight dataflow engine written in Python.
6+
Pydra is a new lightweight dataflow engine written in Python, which provides a simple way to
7+
implement scientific workflows that use a mix of shell commands and Python functions.
8+
79
Pydra is developed as an open-source project in the neuroimaging community,
810
but it is designed as a general-purpose dataflow engine to support any scientific domain.
911

10-
Rationale
11-
---------
12-
13-
Scientific workflows often require sophisticated analyses that encompass a large collection
14-
of algorithms.
15-
The algorithms, that were originally not necessarily designed to work together,
16-
and were written by different authors.
17-
Some may be written in Python, while others might require calling external programs.
18-
It is a common practice to create semi-manual workflows that require the scientists
19-
to handle the files and interact with partial results from algorithms and external tools.
20-
This approach is conceptually simple and easy to implement, but the resulting workflow
21-
is often time consuming, error-prone and difficult to share with others.
22-
Consistency, reproducibility and scalability demand scientific workflows
23-
to be organized into fully automated pipelines.
24-
This was the motivation behind Pydra - a new dataflow engine written in Python.
25-
26-
History
27-
-------
28-
29-
The Pydra package is a part of the second generation of the Nipype_ ecosystem
30-
--- an open-source framework that provides a uniform interface to existing neuroimaging
31-
software and facilitates interaction between different software components.
32-
The Nipype project was born in the neuroimaging community, and has been helping scientists
33-
build workflows for a decade, providing a uniform interface to such neuroimaging packages
34-
as FSL_, ANTs_, AFNI_, FreeSurfer_ and SPM_.
35-
This flexibility has made it an ideal basis for popular preprocessing tools,
36-
such as fMRIPrep_ and C-PAC_.
37-
The second generation of Nipype ecosystem is meant to provide additional flexibility
38-
and is being developed with reproducibility, ease of use, and scalability in mind.
39-
Pydra itself is a standalone project and is designed as a general-purpose dataflow engine
40-
to support any scientific domain.
41-
42-
Design goals
43-
------------
44-
45-
The goal of Pydra is to provide a lightweight dataflow engine for computational graph construction,
46-
manipulation, and distributed execution, as well as ensuring reproducibility of scientific pipelines.
47-
In Pydra, a dataflow is represented as a directed acyclic graph, where each node represents a Python
48-
function, execution of an external tool, or another reusable dataflow.
49-
The combination of several key features makes Pydra a customizable and powerful dataflow engine:
50-
51-
- Composable dataflows: Any node of a dataflow graph can be another dataflow, allowing for nested
52-
dataflows of arbitrary depths and encouraging creating reusable dataflows.
12+
See the :ref:`Design philosophy` for more an explanation of the design
13+
philosophy and goals of Pydra.
5314

54-
- Flexible semantics for creating nested loops over input sets: Any Task or dataflow can be run
55-
over input parameter sets and the outputs can be recombined (similar concept to Map-Reduce_ model,
56-
but Pydra extends this to graphs with nested dataflows).
57-
58-
- A content-addressable global cache: Hash values are computed for each graph and each Task.
59-
This supports reusing of previously computed and stored dataflows and Tasks.
60-
61-
- Support for Python functions and external (shell) commands: Pydra can decorate and use existing
62-
functions in Python libraries alongside external command line tools, allowing easy integration
63-
of existing code and software.
15+
Installation
16+
------------
6417

65-
- Native container execution support: Any dataflow or Task can be executed in an associated container
66-
(via Docker or Singularity) enabling greater consistency for reproducibility.
18+
Pydra itself is a pure-Python package, which has only a handful of dependencies,
19+
therefore, it is straightforward to install via pip
6720

68-
- Auditing and provenance tracking: Pydra provides a simple JSON-LD-based message passing mechanism
69-
to capture the dataflow execution activities as a provenance graph. These messages track inputs
70-
and outputs of each task in a dataflow, and the resources consumed by the task.
21+
.. code-block:: bash
7122
72-
.. _Nipype: https://nipype.readthedocs.io/en/latest/
73-
.. _FSL: https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/FSL
74-
.. _ANTs: http://stnava.github.io/ANTs/
75-
.. _AFNI: https://afni.nimh.nih.gov/
76-
.. _FreeSurfer: https://surfer.nmr.mgh.harvard.edu/
77-
.. _SPM: https://www.fil.ion.ucl.ac.uk/spm/
78-
.. _fMRIPrep: https://fmriprep.org/en/stable/
79-
.. _C-PAC: https://fcp-indi.github.io/docs/latest/index
80-
.. _Map-Reduce: https://en.wikipedia.org/wiki/MapReduce
23+
$ pip install pydra
8124
25+
Of course, if you use Pydra to execute shell-commands tools, you will need to either have
26+
those commands installed on the execution machine, or use software containers
27+
(e.g., Docker or Singularity) to run them.
8228

8329

8430
Indices and tables
85-
==================
31+
------------------
8632

8733
* :ref:`genindex`
8834
* :ref:`modindex`
@@ -94,7 +40,7 @@ Indices and tables
9440
:hidden:
9541

9642
tutorial/execution
97-
tutorial/task
43+
tutorial/python
9844
tutorial/shell
9945
tutorial/workflow
10046

@@ -103,7 +49,7 @@ Indices and tables
10349
:caption: How-to Guides
10450
:hidden:
10551

106-
howto/install
52+
howto/real-example
10753
howto/create-task-package
10854
howto/port-from-nipype
10955

@@ -112,6 +58,7 @@ Indices and tables
11258
:caption: Explanation
11359
:hidden:
11460

61+
explanation/design-approach
11562
explanation/splitting-combining
11663
explanation/typing
11764
explanation/hashing-caching

new-docs/source/tutorial/execution.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
"cell_type": "markdown",
55
"metadata": {},
66
"source": [
7-
"# Executing tasks\n",
7+
"# Running tasks\n",
88
"\n",
99
"A *Task* is the basic runnable component in Pydra, and can execute either a Python function,\n",
1010
"shell command or workflows consisting of combinations of all three types."

new-docs/source/tutorial/task.ipynb renamed to new-docs/source/tutorial/python.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
"cell_type": "markdown",
55
"metadata": {},
66
"source": [
7-
"# Task design"
7+
"# Python-task design"
88
]
99
},
1010
{

0 commit comments

Comments
 (0)