nipype
diff --git a/‎docs/changes.rst
Lines changed: 20 additions & 0 deletions b/‎docs/changes.rst
Lines changed: 20 additions & 0 deletions
diff --git a/‎docs/combiner.rst
Lines changed: 66 additions & 0 deletions b/‎docs/combiner.rst
Lines changed: 66 additions & 0 deletions
diff --git a/‎docs/components.rst
Lines changed: 187 additions & 0 deletions b/‎docs/components.rst
Lines changed: 187 additions & 0 deletions
diff --git a/‎docs/images/nd_spl_1.png
30.2 KB b/‎docs/images/nd_spl_1.png
30.2 KB
diff --git a/‎docs/images/nd_spl_3.png
25.9 KB b/‎docs/images/nd_spl_3.png
25.9 KB
diff --git a/‎docs/images/nd_spl_3_comb1.png
26.7 KB b/‎docs/images/nd_spl_3_comb1.png
26.7 KB
diff --git a/‎docs/images/nd_spl_3_comb3.png
27.5 KB b/‎docs/images/nd_spl_3_comb3.png
27.5 KB
diff --git a/‎docs/images/nd_spl_4.png
16.5 KB b/‎docs/images/nd_spl_4.png
16.5 KB
diff --git a/‎docs/index.rst
Lines changed: 68 additions & 0 deletions b/‎docs/index.rst
Lines changed: 68 additions & 0 deletions
@@ -1,6 +1,26 @@
 Release Notes
 =============
 
+0.8.0
+-----
+
+* refactoring template formatting for ``input_spec``
+* fixing issues with input fields with extension (and using them in templates)
+* adding simple validators to input spec (using ``attr.validator``)
+* adding ``create_dotfile`` for workflows, that creates graphs as dotfiles (can convert to other formats if dot available)
+* adding a simple user guide with ``input_spec`` description
+* expanding docstrings for ``State``, ``audit`` and ``messanger``
+* updating syntax to newer python
+
+0.7.0
+-----
+
+* refactoring the error handling by padra: improving raised errors, removing nodes from the workflow graph that can't be run
+* refactoring of the ``input_spec``: adapting better to the nipype interfaces
+* switching from ``pkg_resources.declare_namespace`` to the stdlib ``pkgutil.extend_path``
+* moving ``readme`` to rst format
+
+
 0.6.2
 -----
 
 
@@ -0,0 +1,66 @@
+Grouping Task's Output
+=======================
+
+In addition to the splitting the input, *Pydra* supports grouping
+or combining the output resulting from the splits.
+In order to achieve this for a *Task*, a user can specify a *combiner*.
+This can be set by calling ``combine`` method.
+Note, the *combiner* only makes sense when a *splitter* is
+set first. When *combiner=x*, all values are combined together within one list,
+and each element of the list represents an output of the *Task* for the specific
+value of the input *x*. Splitting and combining for this example can be written
+as follows:
+
+.. math::
+
+   S = x &:& ~x=[x_1, x_2, ..., x_n] \mapsto x=x_1, x=x_2, ..., x=x_n, \\
+   C = x &:& ~out(x_1), ...,out(x_n) \mapsto out_{comb}=[out(x_1), ...out(x_n)],
+
+where `S` represents the *splitter*, *C* represents the *combiner*, :math:`x` is the input field,
+:math:`out(x_i)` represents the output of the *Task* for :math:`x_i`, and :math:`out_{comb}`
+is the final output after applying the *combiner*.
+
+In the situation where input has multiple fields and an *outer splitter* is used,
+there are various ways of combining the output.
+Taking as an example the task from the previous section,
+user might want to combine all the outputs for one specific value of :math:`x_i` and
+all the values of :math:`y`.
+In this situation, the combined output would be a two dimensional list, each
+inner list for each value of :math:`x`. This can be written as follow:
+
+.. math::
+
+    C = y &:& ~out(x_1, y1), out(x_1, y2), ...out(x_n, y_m) \\
+    &\longmapsto& ~[[out(x_1, y_1), ..., out(x_1, y_m)], \\
+    && ~..., \\
+    && ~[out(x_n, y_1), ..., out(x_n, y_m)]].
+
+
+
+
+.. figure:: images/nd_spl_3_comb1.png
+   :figclass: h!
+   :scale: 75%
+
+
+
+However, for the same task the user might want to combine
+all values of :math:`x` for specific values of :math:`y`.
+One may also need to combine all the values together.
+This can be achieved by providing a list of fields, :math:`[x, y]` to the combiner.
+When a full combiner is set, i.e. all the fields from
+the splitter are also in the combiner, the output is a one dimensional list:
+
+.. math::
+
+   C = [x, y] : out(x_1, y1), ...out(x_n, y_m) \longmapsto [out(x_1, y_1), ..., out(x_n, y_m)].
+
+
+.. figure:: images/nd_spl_3_comb3.png
+   :figclass: h!
+   :scale: 75%
+
+These are the basic examples of the *Pydra*'s *splitter-combiner* concept. It
+is important to note, that *Pydra* allows for mixing *splitters* and *combiners*
+on various levels of a dataflow. They can be set on a single *Task* or a *Workflow*.
+They can be passed from one *Task* to following *Tasks* within the *Workflow*.
@@ -0,0 +1,187 @@
+Dataflows Components: Task and Workflow
+=======================================
+A *Task* is the basic runnable component of *Pydra* and is described by the
+class ``TaskBase``. A *Task* has named inputs and outputs, thus allowing
+construction of dataflows. It can be hashed and executes in a specific working
+directory. Any *Pydra*'s *Task* can be used as a function in a script, thus allowing
+dual use in *Pydra*'s *Workflows* and in standalone scripts. There are several
+classes that inherit from ``TaskBase`` and each has a different application:
+
+
+Function Tasks
+--------------
+
+* ``FunctionTask`` is a *Task* that executes Python functions. Most Python functions
+  declared in an existing library, package, or interactively in a terminal can
+  be converted to a ``FunctionTask`` by using *Pydra*'s decorator - ``mark.task``.
+
+  .. code-block:: python
+
+     import numpy as np
+     from pydra import mark
+     fft = mark.annotate({'a': np.ndarray,
+                      'return': float})(np.fft.fft)
+     fft_task = mark.task(fft)()
+     result = fft_task(a=np.random.rand(512))
+
+
+  `fft_task` is now a *Pydra* *Task* and result will contain a *Pydra*'s ``Result`` object.
+  In addition, the user can use Python's function annotation or another *Pydra*
+  decorator --- ``mark.annotate`` in order to specify the output. In the
+  following example, we decorate an arbitrary Python function to create named
+  outputs:
+
+  .. code-block:: python
+
+     @mark.task
+     @mark.annotate(
+         {"return": {"mean": float, "std": float}}
+     )
+     def mean_dev(my_data):
+         import statistics as st
+         return st.mean(my_data), st.stdev(my_data)
+
+     result = mean_dev(my_data=[...])()
+
+  When the *Task* is executed `result.output` will contain two attributes: `mean`
+  and `std`. Named attributes facilitate passing different outputs to
+  different downstream nodes in a dataflow.
+
+
+.. _shell_command_task:
+
+Shell Command Tasks
+-------------------
+
+* ``ShellCommandTask`` is a *Task* used to run shell commands and executables.
+  It can be used with a simple command without any arguments, or with specific
+  set of arguments and flags, e.g.:
+
+  .. code-block:: python
+
+     ShellCommandTask(executable="pwd")
+
+     ShellCommandTask(executable="ls", args="my_dir")
+
+  The *Task* can accommodate more complex shell commands by allowing the user to
+  customize inputs and outputs of the commands.
+  One can generate an input
+  specification to specify names of inputs, positions in the command, types of
+  the inputs, and other metadata.
+  As a specific example, FSL's BET command (Brain
+  Extraction Tool) can be called on the command line as:
+
+  .. code-block:: python
+
+    bet input_file output_file -m
+
+  Each of the command argument can be treated as a named input to the
+  ``ShellCommandTask``, and can be included in the input specification.
+  As shown next, even an output is specified by constructing
+  the *out_file* field form a template:
+
+  .. code-block:: python
+
+    bet_input_spec = SpecInfo(
+        name="Input",
+        fields=[
+        ( "in_file", File,
+          { "help_string": "input file ...",
+            "position": 1,
+            "mandatory": True } ),
+        ( "out_file", str,
+          { "help_string": "name of output ...",
+            "position": 2,
+            "output_file_template":
+                              "{in_file}_br" } ),
+        ( "mask", bool,
+          { "help_string": "create binary mask",
+            "argstr": "-m", } ) ],
+        bases=(ShellSpec,) )
+
+    ShellCommandTask(executable="bet",
+                     input_spec=bet_input_spec)
+
+  More details are in the :ref:`Input Specification section`.
+
+Container Tasks
+---------------
+* ``ContainerTask`` class is a child class of ``ShellCommandTask`` and serves as
+  a parent class for ``DockerTask`` and ``SingularityTask``. Both *Container Tasks*
+  run shell commands or executables within containers with specific user defined
+  environments using Docker_ and Singularity_ software respectively.
+  This might be extremely useful for users and projects that require environment
+  encapsulation and sharing.
+  Using container technologies helps improve scientific
+  workflows reproducibility, one of the key concept behind *Pydra*.
+
+  These *Container Tasks* can be defined by using
+  ``DockerTask`` and ``SingularityTask`` classes directly, or can be created
+  automatically from ``ShellCommandTask``, when an optional argument
+  ``container_info`` is used when creating a *Shell Task*. The following two
+  types of syntax are equivalent:
+
+  .. code-block:: python
+
+     DockerTask(executable="pwd", image="busybox")
+
+     ShellCommandTask(executable="ls",
+          container_info=("docker", "busybox"))
+
+Workflows
+---------
+* ``Workflow`` - is a subclass of *Task* that provides support for creating *Pydra*
+  dataflows. As a subclass, a *Workflow* acts like a *Task* and has inputs, outputs,
+  is hashable, and is treated as a single unit. Unlike *Tasks*, workflows embed
+  a directed acyclic graph. Each node of the graph contains a *Task* of any type,
+  including another *Workflow*, and can be added to the *Workflow* simply by calling
+  the ``add`` method. The connections between *Tasks* are defined by using so
+  called *Lazy Inputs* or *Lazy Outputs*. These are special attributes that allow
+  assignment of values when a *Workflow* is executed rather than at the point of
+  assignment. The following example creates a *Workflow* from two *Pydra* *Tasks*.
+
+  .. code-block:: python
+
+    # creating workflow with two input fields
+    wf = Workflow(input_spec=["x", "y"])
+    # adding a task and connecting task's input
+    # to the workflow input
+    wf.add(mult(name="mlt",
+                   x=wf.lzin.x, y=wf.lzin.y))
+    # adding anoter task and connecting
+    # task's input to the "mult" task's output
+    wf.add(add2(name="add", x=wf.mlt.lzout.out))
+    # setting worflow output
+    wf.set_output([("out", wf.add.lzout.out)])
+
+
+Task's State
+------------
+All Tasks, including Workflows, can have an optional attribute representing an instance of the State class.
+This attribute controls the execution of a Task over different input parameter sets.
+This class is at the heart of Pydra's powerful Map-Reduce over arbitrary inputs of nested dataflows feature.
+The State class formalizes how users can specify arbitrary combinations.
+Its functionality is used to create and track different combinations of input parameters,
+and optionally allow limited or complete recombinations.
+In order to specify how the inputs should be split into parameter sets, and optionally combined after
+the Task execution, the user can set splitter and combiner attributes of the State class.
+
+.. code-block:: python
+
+  task_with_state =
+        add2(x=[1, 5]).split("x").combine("x")
+
+In this example, the ``State`` class is responsible for creating a list of two
+separate inputs, *[{x: 1}, {x:5}]*, each run of the *Task* should get one
+element from the list.
+The results are grouped back when returning the result from the *Task*.
+While this example
+illustrates mapping and grouping of results over a single parameter, *Pydra*
+extends this to arbitrary combinations of input fields and downstream grouping
+over nested dataflows. Details of how splitters and combiners power *Pydra*'s
+scalable dataflows are described in the next section.
+
+
+
+.. _Docker: https://www.docker.com/
+.. _Singularity: https://www.singularity.lbl.gov/
@@ -6,10 +6,78 @@
 Welcome to Pydra: A simple dataflow engine with scalable semantics's documentation!
 ===================================================================================
 
+Pydra is a new lightweight dataflow engine written in Python.
+Pydra is developed as an open-source project in the neuroimaging community,
+but it is designed as a general-purpose dataflow engine to support any scientific domain.
+
+Scientific workflows often require sophisticated analyses that encompass a large collection
+of algorithms.
+The algorithms, that were originally not necessarily designed to work together,
+and were written by different authors.
+Some may be written in Python, while others might require calling external programs.
+It is a common practice to create semi-manual workflows that require the scientists
+to handle the files and interact with partial results from algorithms and external tools.
+This approach is conceptually simple and easy to implement, but the resulting workflow
+is often time consuming, error-prone and difficult to share with others.
+Consistency, reproducibility and scalability demand scientific workflows
+to be organized into fully automated pipelines.
+This was the motivation behind Pydra - a new dataflow engine written in Python.
+
+The Pydra package is a part of the second generation of the Nipype_ ecosystem
+--- an open-source framework that provides a uniform interface to existing neuroimaging
+software and facilitates interaction between different software components.
+The Nipype project was born in the neuroimaging community, and has been helping scientists
+build workflows for a decade, providing a uniform interface to such neuroimaging packages
+as FSL_, ANTs_, AFNI_, FreeSurfer_ and SPM_.
+This flexibility has made it an ideal basis for popular preprocessing tools,
+such as fMRIPrep_ and C-PAC_.
+The second generation of Nipype ecosystem is meant to provide additional flexibility
+and is being developed with reproducibility, ease of use, and scalability in mind.
+Pydra itself is a standalone project and is designed as a general-purpose dataflow engine
+to support any scientific domain.
+
+The goal of Pydra is to provide a lightweight dataflow engine for computational graph construction,
+manipulation, and distributed execution, as well as ensuring reproducibility of scientific pipelines.
+In Pydra, a dataflow is represented as a directed acyclic graph, where each node represents a Python
+function, execution of an external tool, or another reusable dataflow.
+The combination of several key features makes Pydra a customizable and powerful dataflow engine:
+
+- Composable dataflows: Any node of a dataflow graph can be another dataflow, allowing for nested
+  dataflows of arbitrary depths and encouraging creating reusable dataflows.
+
+- Flexible semantics for creating nested loops over input sets: Any Task or dataflow can be run
+  over input parameter sets and the outputs can be recombined (similar concept to Map-Reduce_ model,
+  but Pydra extends this to graphs with nested dataflows).
+
+- A content-addressable global cache: Hash values are computed for each graph and each Task.
+  This supports reusing of previously computed and stored dataflows and Tasks.
+
+- Support for Python functions and external (shell) commands: Pydra can decorate and use existing
+  functions in Python libraries alongside external command line tools, allowing easy integration
+  of existing code and software.
+
+- Native container execution support: Any dataflow or Task can be executed in an associated container
+  (via Docker or Singularity) enabling greater consistency for reproducibility.
+
+- Auditing and provenance tracking: Pydra provides a simple JSON-LD-based message passing mechanism
+  to capture the dataflow execution activties as a provenance graph. These messages track inputs
+  and outputs of each task in a dataflow, and the resources consumed by the task.
+
+.. _Nipype: https://nipype.readthedocs.io/en/latest/
+.. _FSL: https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/FSL
+.. _ANTs: http://stnava.github.io/ANTs/
+.. _AFNI: https://afni.nimh.nih.gov/
+.. _FreeSurfer: https://surfer.nmr.mgh.harvard.edu/
+.. _SPM: https://www.fil.ion.ucl.ac.uk/spm/
+.. _fMRIPrep: https://fmriprep.org/en/stable/
+.. _C-PAC: https://fcp-indi.github.io/docs/latest/index
+.. _Map-Reduce: https://en.wikipedia.org/wiki/MapReduce
+
 .. toctree::
    :maxdepth: 2
    :caption: Contents:
 
+   user_guide
    changes
    api