|
6 | 6 | "source": [ |
7 | 7 | "# Getting started\n", |
8 | 8 | "\n", |
9 | | - "## Running your first task\n", |
10 | | - "\n", |
11 | 9 | "The basic runnable component of Pydra is a *task*. Tasks are conceptually similar to\n", |
12 | | - "functions, in that they take inputs, process them and then return results. However,\n", |
| 10 | + "functions, in that they take inputs, operate on them and then return results. However,\n", |
13 | 11 | "unlike functions, tasks are parameterised before they are executed in a separate step.\n", |
14 | 12 | "This enables parameterised tasks to be linked together into workflows that are checked for\n", |
15 | 13 | "errors before they are executed, and modular execution workers and environments to specified\n", |
16 | 14 | "independently of the task being performed.\n", |
17 | 15 | "\n", |
| 16 | + "Tasks can encapsulate Python functions, shell-commands or workflows constructed from\n", |
| 17 | + "task components.\n", |
| 18 | + "\n", |
| 19 | + "## Running your first task\n", |
| 20 | + "\n", |
18 | 21 | "Pre-defined task definitions are installed under the `pydra.tasks.*` namespace by separate\n", |
19 | 22 | "task packages (e.g. `pydra-fsl`, `pydra-ants`, ...). Pre-define task definitions are run by\n", |
20 | 23 | "\n", |
21 | 24 | "* importing the class from the `pydra.tasks.*` package it is in\n", |
22 | 25 | "* instantiate the class with the parameters of the task\n", |
23 | 26 | "* \"call\" resulting object to execute it as you would a function (i.e. with the `my_task(...)`)\n", |
24 | 27 | "\n", |
25 | | - "To demonstrate with a toy example, of loading a JSON file with the `pydra.tasks.common.LoadJson` task, this we first create an example JSON file" |
| 28 | + "To demonstrate with a toy example of loading a JSON file with the `pydra.tasks.common.LoadJson` task, we first create an example JSON file to test with" |
26 | 29 | ] |
27 | 30 | }, |
28 | 31 | { |
29 | 32 | "cell_type": "code", |
30 | | - "execution_count": 6, |
| 33 | + "execution_count": 1, |
31 | 34 | "metadata": {}, |
32 | 35 | "outputs": [], |
33 | 36 | "source": [ |
34 | 37 | "from pathlib import Path\n", |
35 | 38 | "from tempfile import mkdtemp\n", |
36 | 39 | "import json\n", |
37 | 40 | "\n", |
38 | | - "JSON_CONTENTS = {'a': True, 'b': 'two', 'c': 3, 'd': [7, 0.5598136790149003, 6]}\n", |
| 41 | + "JSON_CONTENTS = {'a': True, 'b': 'two', 'c': 3, 'd': [7, 0.55, 6]}\n", |
39 | 42 | "\n", |
40 | 43 | "test_dir = Path(mkdtemp())\n", |
41 | 44 | "json_file = test_dir / \"test.json\"\n", |
|
70 | 73 | "assert result.output.out == JSON_CONTENTS" |
71 | 74 | ] |
72 | 75 | }, |
| 76 | + { |
| 77 | + "cell_type": "markdown", |
| 78 | + "metadata": {}, |
| 79 | + "source": [ |
| 80 | + "The `Result` object contains\n", |
| 81 | + "\n", |
| 82 | + "* `output`: the outputs of the task (if there is only one output it is called `out` by default)\n", |
| 83 | + "* `runtime`: information about the peak memory and CPU usage\n", |
| 84 | + "* `errored`: the error status of the task\n", |
| 85 | + "* `task`: the task object that generated the results\n", |
| 86 | + "* `output_dir`: the output directory the results are stored in" |
| 87 | + ] |
| 88 | + }, |
73 | 89 | { |
74 | 90 | "cell_type": "markdown", |
75 | 91 | "metadata": {}, |
|
143 | 159 | "# as the number of NIfTI files\n", |
144 | 160 | "VOXEL_SIZES = [0.5, 0.5, 0.5, 0.75, 0.75, 0.75, 1.0, 1.0, 1.0, 1.25]\n", |
145 | 161 | "\n", |
146 | | - "mrgrid_varying_sizes = MrGrid().split(\n", |
| 162 | + "mrgrid_varying_vox_sizes = MrGrid().split(\n", |
147 | 163 | " (\"input\", \"voxel\"),\n", |
148 | 164 | " input=nifti_dir.iterdir(),\n", |
149 | 165 | " voxel=VOXEL_SIZES\n", |
150 | 166 | ")\n", |
151 | 167 | "\n", |
152 | 168 | "# Run the task to resample all NIfTI files with different voxel sizes\n", |
153 | | - "result = mrgrid()" |
| 169 | + "result = mrgrid_varying_vox_sizes(cache_dir=test_dir / \"cache\")" |
154 | 170 | ] |
155 | 171 | }, |
156 | 172 | { |
|
159 | 175 | "source": [ |
160 | 176 | "## Cache directories\n", |
161 | 177 | "\n", |
162 | | - "When a task runs, a hash is generated by the combination of all the inputs to the task and the task to be run." |
| 178 | + "When a task runs, a unique hash is generated by the combination of all the inputs to the\n", |
| 179 | + "task and the operation to be performed. This hash is used to name the output directory for\n", |
| 180 | + "the task within the specified cache directory. Therefore, if you use the same cache\n", |
| 181 | + "directory between runs and in a subsequent run the same task is executed with the same\n", |
| 182 | + "inputs then the location of its output directory will also be the same, and the outputs\n", |
| 183 | + "generated by the previous run are reused." |
| 184 | + ] |
| 185 | + }, |
| 186 | + { |
| 187 | + "cell_type": "code", |
| 188 | + "execution_count": null, |
| 189 | + "metadata": {}, |
| 190 | + "outputs": [], |
| 191 | + "source": [ |
| 192 | + "mrgrid_varying_vox_sizes2 = MrGrid().split(\n", |
| 193 | + " (\"input\", \"voxel\"),\n", |
| 194 | + " input=nifti_dir.iterdir(),\n", |
| 195 | + " voxel=VOXEL_SIZES\n", |
| 196 | + ")\n", |
| 197 | + "\n", |
| 198 | + "# Result from previous run is reused as the task and inputs are identical\n", |
| 199 | + "result1 = mrgrid_varying_vox_sizes2(cache_dir=test_dir / \"cache\")\n", |
| 200 | + "\n", |
| 201 | + "# Check that the output directory is the same for both runs\n", |
| 202 | + "assert result1.output_dir == result.output_dir\n", |
| 203 | + "\n", |
| 204 | + "# Change the voxel sizes to resample the NIfTI files to for one of the files\n", |
| 205 | + "mrgrid_varying_vox_sizes2.inputs.voxel[2] = [0.25]\n", |
| 206 | + "\n", |
| 207 | + "# Result from previous run is reused as the task and inputs are identical\n", |
| 208 | + "result2 = mrgrid_varying_vox_sizes2(cache_dir=test_dir / \"cache\")\n", |
| 209 | + "\n", |
| 210 | + "# The output directory will be different as the inputs are now different\n", |
| 211 | + "assert result2.output_dir != result.output_dir" |
| 212 | + ] |
| 213 | + }, |
| 214 | + { |
| 215 | + "cell_type": "markdown", |
| 216 | + "metadata": {}, |
| 217 | + "source": [ |
| 218 | + "Note that for file objects, the contents of the files are used to calculate the hash\n", |
| 219 | + "not their paths. Therefore, when inputting large files there might be some additional\n", |
| 220 | + "overhead on the first run (the file hashes themselves are cached by path and mtime so\n", |
| 221 | + "shouldn't need to be recalculated unless they are modified). However, this makes the\n", |
| 222 | + "hashes invariant to file-system movement. For example, changing the name of one of the\n", |
| 223 | + "files in the nifti directory won't invalidate the hash." |
| 224 | + ] |
| 225 | + }, |
| 226 | + { |
| 227 | + "cell_type": "code", |
| 228 | + "execution_count": null, |
| 229 | + "metadata": {}, |
| 230 | + "outputs": [], |
| 231 | + "source": [ |
| 232 | + "# Rename a NIfTI file within the test directory\n", |
| 233 | + "first_file = next(nifti_dir.iterdir())\n", |
| 234 | + "first_file.rename(first_file.with_name(\"first.nii.gz\"))\n", |
| 235 | + "\n", |
| 236 | + "mrgrid_varying_vox_sizes3 = MrGrid().split(\n", |
| 237 | + " (\"input\", \"voxel\"),\n", |
| 238 | + " input=nifti_dir.iterdir(),\n", |
| 239 | + " voxel=VOXEL_SIZES\n", |
| 240 | + ")\n", |
| 241 | + "\n", |
| 242 | + "# Result from previous run is reused as the task and inputs are identical\n", |
| 243 | + "result3 = mrgrid_varying_vox_sizes2(cache_dir=test_dir / \"cache\")\n", |
| 244 | + "\n", |
| 245 | + "# Check that the output directory is the same for both runs\n", |
| 246 | + "assert result3.output_dir == result.output_dir" |
| 247 | + ] |
| 248 | + }, |
| 249 | + { |
| 250 | + "cell_type": "markdown", |
| 251 | + "metadata": {}, |
| 252 | + "source": [ |
| 253 | + "## Debugging\n", |
| 254 | + "\n", |
| 255 | + "Work in progress..." |
163 | 256 | ] |
164 | 257 | }, |
165 | 258 | { |
|
0 commit comments