From bff4f76ec9717bb9c16a20b7a71cf4caeeed1419 Mon Sep 17 00:00:00 2001 From: Giovanni Pizzi Date: Thu, 23 Oct 2025 10:12:05 +0200 Subject: [PATCH 1/3] Enhance documentation on process types in AiiDA Clarify the distinction between calculation-like and workflow-like processes in AiiDA. Expand on the roles and capabilities of each process type, including whether workflow can do data creation. --- docs/source/topics/processes/concepts.rst | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/docs/source/topics/processes/concepts.rst b/docs/source/topics/processes/concepts.rst index 935c51442d..7a3cfd10ca 100644 --- a/docs/source/topics/processes/concepts.rst +++ b/docs/source/topics/processes/concepts.rst @@ -23,16 +23,23 @@ A good thing to remember is that while it is running, we are dealing with the `` Process types ============= -Processes in AiiDA come in two flavors: +In AiiDA, all processes are conceptually divided into two main types: -* Calculation-like -* Workflow-like +* Calculation-like processes: These are processes that create data. Their role is to perform well-defined computations or transformations that produce new ``Data`` nodes, which are recorded as outputs (of type ``CREATE`` in the provenance graph). + +* Workflow-like processes: These are processes that orchestrate other processes, defining how multiple calculations or sub-workflows are executed in sequence or in parallel. Conceptually, workflows do not directly generate data, but rather return data produced by the calculations they run. -The calculation-like processes have the capability to *create* data, whereas the workflow-like processes orchestrate other processes and have the ability to *return* data produced by calculations. Again, this is a distinction that plays a big role in AiiDA and is crucial to understand. For this reason, these different types of processes also get a different sub class of the ``ProcessNode`` class. The hierarchy of these node classes and the link types that are allowed between them and ``Data`` nodes, is explained in detail in the :ref:`provenance implementation` documentation. + +.. note:: Technically, a workflow is able to create new ``Data`` nodes directly (for example, by instantiating and storing a ``Dict`` or ``StructureData`` node inside its code). However, this will *not* show up as a node created by the workflow in the provenance graph, i.e., there will be no direct link between the workflow node and the ``Data`` node. Instead, the ``Data`` node will appear as having no creator (similar to a ``Data`` node created in the shell). + + In practice, this censario is is supported and actually turns out to be a good practice when the purpose is to prepare inputs for a subprocess (a sub-calculation or sub-workflow) that the workflow will launch. In fact, in such cases, the provenance remains interpretable: these Data nodes will have no creator (i.e. no ``CREATE`` link from a calculation-like process) but will typically appear as inputs to a process that is called by the workflow. From the combination of data provenance (no creator) and logical provenance (the call link from the parent workflow), it becomes clear that either the workflow either created the ``Data`` node to pass it as input, or picked an existing node in the database (e.g. some input data created manually by the user) and passed it as input. Knowing the goal of the calling workflow (or inspecting its source code) will often be enough to understand what happened exaclty. Furthermore, this approach helps keeping the provenance graph readable and avoids introducing extra ``calcfunctions`` (see below), whose only role would be trivial data assembly. + + However, we stress that instead workflows *should not* create data that are meant to represent final results, and just store them as outputs. In such cases, the data should instead be produced by a calculation-like process (e.g. a ``calcfunction`` or ``CalcJob``, see below), so that the provenance clearly records how and why the output was generated. + Currently, there are four types of processes in ``aiida-core`` and the following table shows with which node class it is represented in the provenance graph and what the process is used for. =================================================================== ============================================================================== =============================================================== From d447e47539438ad72c356735afed7353e7866155 Mon Sep 17 00:00:00 2001 From: Giovanni Pizzi Date: Fri, 24 Oct 2025 23:06:28 +0200 Subject: [PATCH 2/3] Apply suggestions from code review Co-authored-by: Kristjan Eimre --- docs/source/topics/processes/concepts.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/source/topics/processes/concepts.rst b/docs/source/topics/processes/concepts.rst index 7a3cfd10ca..c4040e2eb8 100644 --- a/docs/source/topics/processes/concepts.rst +++ b/docs/source/topics/processes/concepts.rst @@ -34,9 +34,9 @@ For this reason, these different types of processes also get a different sub cla The hierarchy of these node classes and the link types that are allowed between them and ``Data`` nodes, is explained in detail in the :ref:`provenance implementation` documentation. -.. note:: Technically, a workflow is able to create new ``Data`` nodes directly (for example, by instantiating and storing a ``Dict`` or ``StructureData`` node inside its code). However, this will *not* show up as a node created by the workflow in the provenance graph, i.e., there will be no direct link between the workflow node and the ``Data`` node. Instead, the ``Data`` node will appear as having no creator (similar to a ``Data`` node created in the shell). +.. note:: Technically, a workflow is able to create new ``Data`` nodes directly (for example, by instantiating and storing a ``Dict`` or ``StructureData`` node inside its code). However, this will *not* show up as a node created by the workflow in the provenance graph, and the ``Data`` node will appear as having no creator (similar to a ``Data`` node created in the shell). - In practice, this censario is is supported and actually turns out to be a good practice when the purpose is to prepare inputs for a subprocess (a sub-calculation or sub-workflow) that the workflow will launch. In fact, in such cases, the provenance remains interpretable: these Data nodes will have no creator (i.e. no ``CREATE`` link from a calculation-like process) but will typically appear as inputs to a process that is called by the workflow. From the combination of data provenance (no creator) and logical provenance (the call link from the parent workflow), it becomes clear that either the workflow either created the ``Data`` node to pass it as input, or picked an existing node in the database (e.g. some input data created manually by the user) and passed it as input. Knowing the goal of the calling workflow (or inspecting its source code) will often be enough to understand what happened exaclty. Furthermore, this approach helps keeping the provenance graph readable and avoids introducing extra ``calcfunctions`` (see below), whose only role would be trivial data assembly. + In practice, this scenario is supported and actually turns out to be a good practice when the purpose is to prepare inputs for a subprocess (a sub-calculation or sub-workflow) that the workflow will launch. In fact, in such cases, the provenance remains interpretable: these Data nodes will have no creator (i.e. no ``CREATE`` link from a calculation-like process) but will typically appear as inputs to a process that is called by the workflow. From the combination of data provenance (no creator) and logical provenance (the call link from the parent workflow), it becomes clear that the workflow either created the ``Data`` node to pass it as input, or picked an existing node in the database (e.g. some input data created manually by the user) and passed it as input. Knowing the goal of the calling workflow (or inspecting its source code) will often be enough to understand what happened exactly. Furthermore, this approach helps keep the provenance graph readable and avoids introducing extra ``calcfunctions`` (see below), whose only role would be trivial data assembly. However, we stress that instead workflows *should not* create data that are meant to represent final results, and just store them as outputs. In such cases, the data should instead be produced by a calculation-like process (e.g. a ``calcfunction`` or ``CalcJob``, see below), so that the provenance clearly records how and why the output was generated. From cc2985188b8b87af4bc4c93e8af979c4cf61fdd1 Mon Sep 17 00:00:00 2001 From: Giovanni Pizzi Date: Fri, 24 Oct 2025 23:22:46 +0200 Subject: [PATCH 3/3] Fixing pre-commits --- docs/source/topics/processes/concepts.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/source/topics/processes/concepts.rst b/docs/source/topics/processes/concepts.rst index c4040e2eb8..12466f0d31 100644 --- a/docs/source/topics/processes/concepts.rst +++ b/docs/source/topics/processes/concepts.rst @@ -34,9 +34,9 @@ For this reason, these different types of processes also get a different sub cla The hierarchy of these node classes and the link types that are allowed between them and ``Data`` nodes, is explained in detail in the :ref:`provenance implementation` documentation. -.. note:: Technically, a workflow is able to create new ``Data`` nodes directly (for example, by instantiating and storing a ``Dict`` or ``StructureData`` node inside its code). However, this will *not* show up as a node created by the workflow in the provenance graph, and the ``Data`` node will appear as having no creator (similar to a ``Data`` node created in the shell). +.. note:: Technically, a workflow is able to create new ``Data`` nodes directly (for example, by instantiating and storing a ``Dict`` or ``StructureData`` node inside its code). However, this will *not* show up as a node created by the workflow in the provenance graph, and the ``Data`` node will appear as having no creator (similar to a ``Data`` node created in the shell). - In practice, this scenario is supported and actually turns out to be a good practice when the purpose is to prepare inputs for a subprocess (a sub-calculation or sub-workflow) that the workflow will launch. In fact, in such cases, the provenance remains interpretable: these Data nodes will have no creator (i.e. no ``CREATE`` link from a calculation-like process) but will typically appear as inputs to a process that is called by the workflow. From the combination of data provenance (no creator) and logical provenance (the call link from the parent workflow), it becomes clear that the workflow either created the ``Data`` node to pass it as input, or picked an existing node in the database (e.g. some input data created manually by the user) and passed it as input. Knowing the goal of the calling workflow (or inspecting its source code) will often be enough to understand what happened exactly. Furthermore, this approach helps keep the provenance graph readable and avoids introducing extra ``calcfunctions`` (see below), whose only role would be trivial data assembly. + In practice, this scenario is supported and actually turns out to be a good practice when the purpose is to prepare inputs for a subprocess (a sub-calculation or sub-workflow) that the workflow will launch. In fact, in such cases, the provenance remains interpretable: these Data nodes will have no creator (i.e. no ``CREATE`` link from a calculation-like process) but will typically appear as inputs to a process that is called by the workflow. From the combination of data provenance (no creator) and logical provenance (the call link from the parent workflow), it becomes clear that the workflow either created the ``Data`` node to pass it as input, or picked an existing node in the database (e.g. some input data created manually by the user) and passed it as input. Knowing the goal of the calling workflow (or inspecting its source code) will often be enough to understand what happened exactly. Furthermore, this approach helps keep the provenance graph readable and avoids introducing extra ``calcfunctions`` (see below), whose only role would be trivial data assembly. However, we stress that instead workflows *should not* create data that are meant to represent final results, and just store them as outputs. In such cases, the data should instead be produced by a calculation-like process (e.g. a ``calcfunction`` or ``CalcJob``, see below), so that the provenance clearly records how and why the output was generated.