Tutorial updates:

mpielies · mpielies · commit db0798708540 · 2025-03-21T20:58:29.000+01:00
- Specific eir version to use
- How to tailor to own data.
diff --git a/.github/workflows/test_tutorial.yaml b/.github/workflows/test_tutorial.yaml
@@ -16,7 +16,7 @@ jobs:
 
       - name: Install Claster requirements
         run: |
-          pip install numpy pandas matplotlib jupyter ipykernel eir-dl
+          pip install numpy pandas matplotlib jupyter ipykernel eir-dl==0.1.42
       - name: Run papermill for cmd-line execution of notebooks
         run: |
           pip install papermill
diff --git a/scripts/0_Tutorial.ipynb b/scripts/0_Tutorial.ipynb
@@ -21,7 +21,7 @@
     "\n",
     "Hi! \n",
     "\n",
-    "Welcome to this small tutorial on how to build and run CLASTER using the EIR framework. Please clone this repository (CLASTER) on your computer to start.\n",
+    "Welcome to this small tutorial on how to build and run CLASTER using the EIR framework. Please clone this repository (CLASTER) on your computer to start. Otherwise, please make sure that the folder structure including inputs, targets and scripts is the same as in the repository.\n",
     "\n",
     "### About CLASTER\n",
     "\n",
@@ -104,24 +104,37 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## 1. Inputs and outputs\n",
-    "The obtention of inputs and outputs from publicly available sources is detailed in notebook ```1_Data_obtention.ipynb```. In this tutorial we will however provide you with the already created inputs and targets for all samples in the test set.\n",
+    "## 1. Inputs and Targets\n",
+    "The obtention of inputs and targets from publicly available sources is detailed in notebook ```1_Data_obtention.ipynb```. In this tutorial we will however provide you with the already created inputs and targets for all samples in the test set.\n",
     "\n",
     "Input samples and their matching targets are named after the ENSEMBL ID code for the protein coding gene located at the center of the region of interest. We kept the orientation of the genes, and hence the EU-seq signal can go both towards the right or towards the left. \n",
     "\n",
+    "\n",
     "**Landscape branch:**\n",
     "\n",
     "1Mbp regions centered at the TSS of protein coding genes in chromosome 4. \n",
     "\n",
     "- Input arrays can be found at the folders ```inputs/landscape_arrays/test/``` and ```inputs/microC_rotated/test/```. \n",
     "- The matching target profiles are given in a tabular format and can be found in ```targets/test_targets.csv```.\n",
     "\n",
-    "> *Note: As a standard data augmentation procedure, samples were provided in their natural orientation (SampleID_forward.npy) and flipped. (SampleID_forward.npy)*"
+    "> *Note: As a standard data augmentation procedure, samples were provided in their natural orientation (SampleID_forward.npy) and flipped. (SampleID_forward.npy)*\n",
+    "\n",
+    ">**How do I extend this to my dataset?**\n",
+    ">\n",
+    ">*Inputs:* \n",
+    ">\n",
+    ">We need to store all input samples in a folder, e.g. ```/inputs/```. Each sample will be a numpy array with the name {SAMPLE_ID}.npy, of shape (#tracks, sequence length). In our case, #tracks = 4 (ATAC, H3K4me3, H3K27ac, H3K27me3) and sequence length = 10001 (bins of 100bp).\n",
+    ">\n",
+    ">*Targets:*\n",
+    ">\n",
+    ">Targets are provided as a table, where:\n",
+    ">- Columns are ID + name_of_output_1, name_of_ouput_2, etc. In our case we called them ID, -200_ctrl, -199_ctrl, etc.\n",
+    ">- Rows correspond to the sample ID (without .npy) and the table is filled with target values. In our case these were 1kbp read averages for 401 output nodes."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 23,
    "metadata": {
     "tags": [
      "hide-input"
@@ -930,9 +943,68 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 22,
    "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[Epoch 5/120] | loss-average: 0.2258                                            \n",
+      "Epoch [6/120]: [1/36]   3%|▋                      , loss-average=0.225 [00:00<?]Traceback (most recent call last):\n",
+      "  File \"/projects/rasmussen/data/enhancer_logic_project/claster_env/bin/eirtrain\", line 8, in <module>\n",
+      "    sys.exit(main())\n",
+      "             ^^^^^^\n",
+      "  File \"/maps/projects/rasmussen/data/enhancer_logic_project/claster_env/lib/python3.11/site-packages/eir/train.py\", line 77, in main\n",
+      "    run_experiment(experiment=default_experiment)\n",
+      "  File \"/maps/projects/rasmussen/data/enhancer_logic_project/claster_env/lib/python3.11/site-packages/eir/train.py\", line 353, in run_experiment\n",
+      "    train(experiment=experiment)\n",
+      "  File \"/maps/projects/rasmussen/data/enhancer_logic_project/claster_env/lib/python3.11/site-packages/eir/train.py\", line 375, in train\n",
+      "    trainer.run(data=exp.train_loader, max_epochs=gc.n_epochs)\n",
+      "  File \"/maps/projects/rasmussen/data/enhancer_logic_project/claster_env/lib/python3.11/site-packages/ignite/engine/engine.py\", line 898, in run\n",
+      "    return self._internal_run()\n",
+      "           ^^^^^^^^^^^^^^^^^^^^\n",
+      "  File \"/maps/projects/rasmussen/data/enhancer_logic_project/claster_env/lib/python3.11/site-packages/ignite/engine/engine.py\", line 941, in _internal_run\n",
+      "    return next(self._internal_run_generator)\n",
+      "           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
+      "  File \"/maps/projects/rasmussen/data/enhancer_logic_project/claster_env/lib/python3.11/site-packages/ignite/engine/engine.py\", line 999, in _internal_run_as_gen\n",
+      "    self._handle_exception(e)\n",
+      "  File \"/maps/projects/rasmussen/data/enhancer_logic_project/claster_env/lib/python3.11/site-packages/ignite/engine/engine.py\", line 644, in _handle_exception\n",
+      "    raise e\n",
+      "  File \"/maps/projects/rasmussen/data/enhancer_logic_project/claster_env/lib/python3.11/site-packages/ignite/engine/engine.py\", line 965, in _internal_run_as_gen\n",
+      "    epoch_time_taken += yield from self._run_once_on_dataset_as_gen()\n",
+      "                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
+      "  File \"/maps/projects/rasmussen/data/enhancer_logic_project/claster_env/lib/python3.11/site-packages/ignite/engine/engine.py\", line 1074, in _run_once_on_dataset_as_gen\n",
+      "    self.state.output = self._process_function(self, self.state.batch)\n",
+      "                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
+      "  File \"/maps/projects/rasmussen/data/enhancer_logic_project/claster_env/lib/python3.11/site-packages/eir/train.py\", line 415, in step\n",
+      "    state = call_hooks_stage_iterable(\n",
+      "            ^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
+      "  File \"/maps/projects/rasmussen/data/enhancer_logic_project/claster_env/lib/python3.11/site-packages/eir/train_utils/utils.py\", line 130, in call_hooks_stage_iterable\n",
+      "    _, state = state_registered_hook_call(\n",
+      "               ^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
+      "  File \"/maps/projects/rasmussen/data/enhancer_logic_project/claster_env/lib/python3.11/site-packages/eir/train_utils/utils.py\", line 146, in state_registered_hook_call\n",
+      "    state_updates = hook_func(state=state, *args, **kwargs)\n",
+      "                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
+      "  File \"/maps/projects/rasmussen/data/enhancer_logic_project/claster_env/lib/python3.11/site-packages/eir/train_utils/metrics.py\", line 536, in hook_add_uncertainty_loss\n",
+      "    cur_uncertainty_losses = cur_module(losses_dict=cur_loss_dict)\n",
+      "                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
+      "  File \"/maps/projects/rasmussen/data/enhancer_logic_project/claster_env/lib/python3.11/site-packages/torch/nn/modules/module.py\", line 1511, in _wrapped_call_impl\n",
+      "^C\n",
+      "  File \"/maps/projects/rasmussen/data/enhancer_logic_project/claster_env/lib/python3.11/site-packages/torch/nn/modules/module.py\", line 1520, in _call_impl\n",
+      "    return forward_call(*args, **kwargs)\n",
+      "           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
+      "  File \"/maps/projects/rasmussen/data/enhancer_logic_project/claster_env/lib/python3.11/site-packages/eir/train_utils/metrics.py\", line 590, in forward\n",
+      "    loss_value_uncertain = self._calc_uncertainty_loss(\n",
+      "                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
+      "  File \"/maps/projects/rasmussen/data/enhancer_logic_project/claster_env/lib/python3.11/site-packages/eir/train_utils/metrics.py\", line 582, in _calc_uncertainty_loss\n",
+      "    loss = scalar * torch.sum(precision * loss_value + log_var)\n",
+      "                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
+      "KeyboardInterrupt\n",
+      "\u001b[0m"
+     ]
+    }
+   ],
    "source": [
     "! eirtrain \\\n",
     "--global_configs ../configurations/conf_microc_rotated_pure_conv_tutorial/globals.yaml \\\n",