@@ -4,62 +4,69 @@ Glossary
4
4
.. glossary ::
5
5
6
6
Cache-root
7
- The directory where cache directories for tasks to be executed are created.
8
- Task cache directories are named within the cache root directory using a hash
9
- of the task's parameters, so that the same task with the same parameters can be
10
- reused.
7
+ The root directory in which separate cache directories for each job are created.
8
+ Job cache directories are named within the cache-root directory using a unique
9
+ checksum for the job based on the task's parameters and software environment,
10
+ so that if the same job is run again the outputs from the previous run can be
11
+ reuused.
11
12
12
13
Combiner
13
14
A combiner is used to combine :ref: `State-array ` values created by a split operation
14
15
defined by a :ref: `Splitter ` on the current node, upstream workflow nodes or
15
16
stand-alone tasks.
16
17
17
18
Container-ndim
18
- The number of dimensions of the container object to be iterated over when using
19
- a :ref: `Splitter ` to split over an iterable value. For example, a list-of-lists
20
- or a 2D array with `container_ndim=2 ` would be split over the elements of the
21
- inner lists into a single 1-D state array. However, if `container_ndim=1 `,
22
- the outer list/2D would be split into a 1-D state array of lists/1D arrays.
19
+ The number of dimensions of the container object to be flattened into a single
20
+ state array when splitting over nested containers/multi-dimension arrays.
21
+ For example, a list-of-list-of-floats or a 2D numpy array with `container_ndim=1 `,
22
+ the outer list/2D would be split into a 1-D state array consisting of
23
+ list-of-floats or 1D numpy arrays, respectively. Whereas with
24
+ `container_ndim=2 ` they would be split into a state-array of floats consisiting
25
+ of all the elements of the inner-lists/array.
23
26
24
27
Environment
25
28
An environment refers to a specific software encapsulation, such as a Docker
26
- or Singularity image, that is used to run a task.
29
+ or Singularity image, in which a shell tasks are run. They are specified in the
30
+ Submitter object to be used when executing a task.
27
31
28
32
Field
29
- A field is a parameter of a task, or a task outputs object, that can be set to
30
- a specific value. Fields are specified to be of any types, including objects
31
- and file-system objects.
33
+ A field is a parameter of a task, or an output in a task outputs class.
34
+ Fields define the expected datatype of the parameter and other metadata
35
+ parameters that control how the field is validated and passed through to the
36
+ execution of the task.
32
37
33
38
Hook
34
- A hook is a user-defined function that is executed at a specific point in the task
35
- execution process . Hooks can be used to prepare/finalise the task cache directory
39
+ A hook is a user-defined function that is executed at a specific point either before
40
+ or after a task is run . Hooks can be used to prepare/finalise the task cache directory
36
41
or send notifications
37
42
38
43
Job
39
- A job is a discrete unit of work, a :ref: `Task `, with all inputs resolved
40
- (i.e. not lazy-values or state-arrays) that has been assigned to a worker.
41
- A task describes "what" is to be done and a submitter object describes
42
- "how" it is to be done, a job combines both objects to describe a concrete unit
43
- of processing.
44
+ A job consists of a :ref: `Task ` with all inputs resolved
45
+ (i.e. not lazy-values or state-arrays) and a Submitter object. It therefore
46
+ represents a concrete unit of work to be executed, be combining "what" is to be
47
+ done (Task) with "how" it is to be done (Submitter).
44
48
45
49
Lazy-fields
46
50
A lazy-field is a field that is not immediately resolved to a value. Instead,
47
- it is a placeholder that will be resolved at runtime, allowing for dynamic
48
- parameterisation of tasks.
51
+ it is a placeholder that will be resolved at runtime when a workflow is executed,
52
+ allowing for dynamic parameterisation of tasks.
49
53
50
54
Node
51
- A single task within the context of a workflow, which is assigned a name and
52
- references a state. Note this task can be nested workflow task.
55
+ A single task within the context of a workflow. It is assigned a unique name
56
+ within the workflow and references a state object that determines the
57
+ state-array of jobs to be run if present (if the state is None then a single
58
+ job will be run for each node).
53
59
54
60
Read-only-caches
55
61
A read-only cache is a cache root directory that was created by a previous
56
- pydra runs, which is checked for matching task caches to be reused if present
57
- but not written not modified during the execution of a task.
62
+ pydra run. The read-only caches are checked for matching job checksums, which
63
+ are reused if present. However, new job cache dirs are written to the cache root
64
+ so the read-only caches are not modified during the execution.
58
65
59
66
State
60
67
The combination of all upstream splits and combines with any splitters and
61
- combiners for a given node, it is used to track how many jobs, and their
62
- parameterisations, need to be run for a given workflow node.
68
+ combiners for a given node. It is used to track how many jobs, and their
69
+ parameterisations, that need to be run for a given workflow node.
63
70
64
71
State-array
65
72
A state array is a collection of parameterised tasks or values that were generated
@@ -84,8 +91,9 @@ Glossary
84
91
85
92
Worker
86
93
Encapsulation of a task execution environment. It is responsible for executing
87
- tasks and managing their lifecycle. Workers can be local (e.g., a thread or
88
- process) or remote (e.g., high-performance cluster).
94
+ tasks and managing their lifecycle. Workers can be local (e.g., debug and
95
+ concurrent-futures multiprocess) or orchestrated through a remote scheduler
96
+ (e.g., SLURM, SGE).
89
97
90
98
Workflow
91
99
A Directed-Acyclic-Graph (DAG) of parameterised tasks, to be executed in order.
0 commit comments