33Pydra
44=====
55
6- Pydra is a new lightweight dataflow engine written in Python.
6+ Pydra is a new lightweight dataflow engine written in Python, which provides a simple way to
7+ implement scientific workflows that use a mix of shell commands and Python functions.
8+
79Pydra is developed as an open-source project in the neuroimaging community,
810but it is designed as a general-purpose dataflow engine to support any scientific domain.
911
10- Rationale
11- ---------
12-
13- Scientific workflows often require sophisticated analyses that encompass a large collection
14- of algorithms.
15- The algorithms, that were originally not necessarily designed to work together,
16- and were written by different authors.
17- Some may be written in Python, while others might require calling external programs.
18- It is a common practice to create semi-manual workflows that require the scientists
19- to handle the files and interact with partial results from algorithms and external tools.
20- This approach is conceptually simple and easy to implement, but the resulting workflow
21- is often time consuming, error-prone and difficult to share with others.
22- Consistency, reproducibility and scalability demand scientific workflows
23- to be organized into fully automated pipelines.
24- This was the motivation behind Pydra - a new dataflow engine written in Python.
25-
26- History
27- -------
28-
29- The Pydra package is a part of the second generation of the Nipype _ ecosystem
30- --- an open-source framework that provides a uniform interface to existing neuroimaging
31- software and facilitates interaction between different software components.
32- The Nipype project was born in the neuroimaging community, and has been helping scientists
33- build workflows for a decade, providing a uniform interface to such neuroimaging packages
34- as FSL _, ANTs _, AFNI _, FreeSurfer _ and SPM _.
35- This flexibility has made it an ideal basis for popular preprocessing tools,
36- such as fMRIPrep _ and C-PAC _.
37- The second generation of Nipype ecosystem is meant to provide additional flexibility
38- and is being developed with reproducibility, ease of use, and scalability in mind.
39- Pydra itself is a standalone project and is designed as a general-purpose dataflow engine
40- to support any scientific domain.
41-
42- Design goals
43- ------------
44-
45- The goal of Pydra is to provide a lightweight dataflow engine for computational graph construction,
46- manipulation, and distributed execution, as well as ensuring reproducibility of scientific pipelines.
47- In Pydra, a dataflow is represented as a directed acyclic graph, where each node represents a Python
48- function, execution of an external tool, or another reusable dataflow.
49- The combination of several key features makes Pydra a customizable and powerful dataflow engine:
50-
51- - Composable dataflows: Any node of a dataflow graph can be another dataflow, allowing for nested
52- dataflows of arbitrary depths and encouraging creating reusable dataflows.
12+ See the :ref: `Design philosophy ` for more an explanation of the design
13+ philosophy and goals of Pydra.
5314
54- - Flexible semantics for creating nested loops over input sets: Any Task or dataflow can be run
55- over input parameter sets and the outputs can be recombined (similar concept to Map-Reduce _ model,
56- but Pydra extends this to graphs with nested dataflows).
57-
58- - A content-addressable global cache: Hash values are computed for each graph and each Task.
59- This supports reusing of previously computed and stored dataflows and Tasks.
60-
61- - Support for Python functions and external (shell) commands: Pydra can decorate and use existing
62- functions in Python libraries alongside external command line tools, allowing easy integration
63- of existing code and software.
15+ Installation
16+ ------------
6417
65- - Native container execution support: Any dataflow or Task can be executed in an associated container
66- (via Docker or Singularity) enabling greater consistency for reproducibility.
18+ Pydra itself is a pure-Python package, which has only a handful of dependencies,
19+ therefore, it is straightforward to install via pip
6720
68- - Auditing and provenance tracking: Pydra provides a simple JSON-LD-based message passing mechanism
69- to capture the dataflow execution activities as a provenance graph. These messages track inputs
70- and outputs of each task in a dataflow, and the resources consumed by the task.
21+ .. code-block :: bash
7122
72- .. _Nipype : https://nipype.readthedocs.io/en/latest/
73- .. _FSL : https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/FSL
74- .. _ANTs : http://stnava.github.io/ANTs/
75- .. _AFNI : https://afni.nimh.nih.gov/
76- .. _FreeSurfer : https://surfer.nmr.mgh.harvard.edu/
77- .. _SPM : https://www.fil.ion.ucl.ac.uk/spm/
78- .. _fMRIPrep : https://fmriprep.org/en/stable/
79- .. _C-PAC : https://fcp-indi.github.io/docs/latest/index
80- .. _Map-Reduce : https://en.wikipedia.org/wiki/MapReduce
23+ $ pip install pydra
8124
25+ Of course, if you use Pydra to execute shell-commands tools, you will need to either have
26+ those commands installed on the execution machine, or use software containers
27+ (e.g., Docker or Singularity) to run them.
8228
8329
8430Indices and tables
85- ==================
31+ ------------------
8632
8733* :ref: `genindex `
8834* :ref: `modindex `
@@ -94,7 +40,7 @@ Indices and tables
9440 :hidden:
9541
9642 tutorial/execution
97- tutorial/task
43+ tutorial/python
9844 tutorial/shell
9945 tutorial/workflow
10046
@@ -103,7 +49,7 @@ Indices and tables
10349 :caption: How-to Guides
10450 :hidden:
10551
106- howto/install
52+ howto/real-example
10753 howto/create-task-package
10854 howto/port-from-nipype
10955
@@ -112,6 +58,7 @@ Indices and tables
11258 :caption: Explanation
11359 :hidden:
11460
61+ explanation/design-approach
11562 explanation/splitting-combining
11663 explanation/typing
11764 explanation/hashing-caching
0 commit comments