writing getting-starting notebook

tclose · tclose · commit 35641b1c6a2f · 2024-12-29T19:07:00.000+11:00
diff --git a/new-docs/source/tutorial/getting-started.ipynb b/new-docs/source/tutorial/getting-started.ipynb
@@ -6,42 +6,160 @@
    "source": [
     "# Getting started\n",
     "\n",
-    "A *Task* is the basic runnable component in Pydra, and can execute either a Python function,\n",
-    "shell command or workflows consisting of combinations of all three types."
+    "## Running your first task\n",
+    "\n",
+    "The basic runnable component of Pydra is a *task*. Tasks are conceptually similar to\n",
+    "functions, in that they take inputs, process them and then return results. However,\n",
+    "unlike functions, tasks are parameterised before they are executed in a separate step.\n",
+    "This enables parameterised tasks to be linked together into workflows that are checked for\n",
+    "errors before they are executed, and modular execution workers and environments to specified\n",
+    "independently of the task being performed.\n",
+    "\n",
+    "Pre-defined task definitions are installed under the `pydra.tasks.*` namespace by separate\n",
+    "task packages (e.g. `pydra-fsl`, `pydra-ants`, ...). Pre-define task definitions are run by\n",
+    "\n",
+    "* importing the class from the `pydra.tasks.*` package it is in\n",
+    "* instantiate the class with the parameters of the task\n",
+    "* \"call\" resulting object to execute it as you would a function (i.e. with the `my_task(...)`)\n",
+    "\n",
+    "To demonstrate with a toy example, of loading a JSON file with the `pydra.tasks.common.LoadJson` task, this we first create an example JSON file"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": 6,
    "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Sample JSON file created at '0UAqFzWsDK4FrUMp48Y3tT3Q.json' with contents: {\"a\": true, \"b\": \"two\", \"c\": 3, \"d\": [7, 0.5598136790149003, 6]}\n",
-      "Loaded contents: {'a': True, 'b': 'two', 'c': 3, 'd': [7, 0.5598136790149003, 6]}\n"
-     ]
-    }
-   ],
+   "outputs": [],
    "source": [
-    "from fileformats.application import Json\n",
-    "from pydra.tasks.common import LoadJson\n",
+    "from pathlib import Path\n",
+    "from tempfile import mkdtemp\n",
+    "import json\n",
     "\n",
-    "# Create a sample JSON file to test\n",
-    "json_file = Json.sample()\n",
+    "JSON_CONTENTS = {'a': True, 'b': 'two', 'c': 3, 'd': [7, 0.5598136790149003, 6]}\n",
     "\n",
-    "# Print the path of the sample JSON file and its contents for reference\n",
-    "print(f\"Sample JSON file created at {json_file.name!r} with contents: {json_file.read_text()}\")\n",
+    "test_dir = Path(mkdtemp())\n",
+    "json_file = test_dir / \"test.json\"\n",
+    "with open(json_file, \"w\") as f:\n",
+    "    json.dump(JSON_CONTENTS, f)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now we can load the JSON contents back from the file using the `LoadJson` task definition\n",
+    "class"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Import the task definition\n",
+    "from pydra.tasks.common import LoadJson\n",
     "\n",
-    "# Parameterise the task specification to load the JSON file\n",
+    "# Instantiate the task definition, providing the JSON file we want to load\n",
     "load_json = LoadJson(file=json_file)\n",
     "\n",
     "# Run the task to load the JSON file\n",
     "result = load_json()\n",
     "\n",
-    "# Print the output interface of the of the task (LoadJson.Outputs)\n",
-    "print(f\"Loaded contents: {result.output.out}\")"
+    "# Access the loaded JSON output contents and check they match original\n",
+    "assert result.output.out == JSON_CONTENTS"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Iterating over inputs\n",
+    "\n",
+    "It is straightforward to apply the same operation over a set of inputs using the `split()`\n",
+    "method. For example, if we wanted to re-grid all the NIfTI images stored in a directory,\n",
+    "such as the sample ones generated by the code below"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from fileformats.medimage import Nifti\n",
+    "\n",
+    "nifti_dir = test_dir / \"nifti\"\n",
+    "nifti_dir.mkdir()\n",
+    "\n",
+    "for i in range(10):\n",
+    "    Nifti.sample(nifti_dir, seed=i)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Then we can by importing the `MrGrid` shell-command task from the `pydra-mrtrix3` package\n",
+    "and then splitting over the list of files in the directory"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from pydra.tasks.mrtrix3 import MrGrid\n",
+    "\n",
+    "# Instantiate the task definition, \"splitting\" over all NIfTI files in the test directory\n",
+    "mrgrid = MrGrid(voxel=0.5).split(input=nifti_dir.iterdir())\n",
+    "\n",
+    "# Run the task to resample all NIfTI files\n",
+    "result = mrgrid()\n",
+    "\n",
+    "# Print the locations of the output files\n",
+    "print(\"\\n\".join(str(p) for p in result.output.output))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "It is also possible to iterate over inputs in pairs, if for example you wanted to use\n",
+    "different voxel sizes for different images, both the list of images and the voxel sizes\n",
+    "are passed to the `split()` method and their combination is specified by a tuple \"splitter\"\n",
+    "(see [Splitting and combining](../explanation/splitting-combining.html) for more details\n",
+    "on splitters)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Define a list of voxel sizes to resample the NIfTI files to, must be the same length\n",
+    "# as the number of NIfTI files\n",
+    "VOXEL_SIZES = [0.5, 0.5, 0.5, 0.75, 0.75, 0.75, 1.0, 1.0, 1.0, 1.25]\n",
+    "\n",
+    "mrgrid_varying_sizes = MrGrid().split(\n",
+    "    (\"input\", \"voxel\"),\n",
+    "    input=nifti_dir.iterdir(),\n",
+    "    voxel=VOXEL_SIZES\n",
+    ")\n",
+    "\n",
+    "# Run the task to resample all NIfTI files with different voxel sizes\n",
+    "result = mrgrid()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Cache directories\n",
+    "\n",
+    "When a task runs, a hash is generated by the combination of all the inputs to the task and the task to be run."
    ]
   },
   {