Skip to content

Commit db07987

Browse files
committed
Tutorial updates:
- Specific eir version to use - How to tailor to own data.
1 parent 9a88170 commit db07987

File tree

2 files changed

+80
-8
lines changed

2 files changed

+80
-8
lines changed

.github/workflows/test_tutorial.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ jobs:
1616

1717
- name: Install Claster requirements
1818
run: |
19-
pip install numpy pandas matplotlib jupyter ipykernel eir-dl
19+
pip install numpy pandas matplotlib jupyter ipykernel eir-dl==0.1.42
2020
- name: Run papermill for cmd-line execution of notebooks
2121
run: |
2222
pip install papermill

scripts/0_Tutorial.ipynb

Lines changed: 79 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121
"\n",
2222
"Hi! \n",
2323
"\n",
24-
"Welcome to this small tutorial on how to build and run CLASTER using the EIR framework. Please clone this repository (CLASTER) on your computer to start.\n",
24+
"Welcome to this small tutorial on how to build and run CLASTER using the EIR framework. Please clone this repository (CLASTER) on your computer to start. Otherwise, please make sure that the folder structure including inputs, targets and scripts is the same as in the repository.\n",
2525
"\n",
2626
"### About CLASTER\n",
2727
"\n",
@@ -104,24 +104,37 @@
104104
"cell_type": "markdown",
105105
"metadata": {},
106106
"source": [
107-
"## 1. Inputs and outputs\n",
108-
"The obtention of inputs and outputs from publicly available sources is detailed in notebook ```1_Data_obtention.ipynb```. In this tutorial we will however provide you with the already created inputs and targets for all samples in the test set.\n",
107+
"## 1. Inputs and Targets\n",
108+
"The obtention of inputs and targets from publicly available sources is detailed in notebook ```1_Data_obtention.ipynb```. In this tutorial we will however provide you with the already created inputs and targets for all samples in the test set.\n",
109109
"\n",
110110
"Input samples and their matching targets are named after the ENSEMBL ID code for the protein coding gene located at the center of the region of interest. We kept the orientation of the genes, and hence the EU-seq signal can go both towards the right or towards the left. \n",
111111
"\n",
112+
"\n",
112113
"**Landscape branch:**\n",
113114
"\n",
114115
"1Mbp regions centered at the TSS of protein coding genes in chromosome 4. \n",
115116
"\n",
116117
"- Input arrays can be found at the folders ```inputs/landscape_arrays/test/``` and ```inputs/microC_rotated/test/```. \n",
117118
"- The matching target profiles are given in a tabular format and can be found in ```targets/test_targets.csv```.\n",
118119
"\n",
119-
"> *Note: As a standard data augmentation procedure, samples were provided in their natural orientation (SampleID_forward.npy) and flipped. (SampleID_forward.npy)*"
120+
"> *Note: As a standard data augmentation procedure, samples were provided in their natural orientation (SampleID_forward.npy) and flipped. (SampleID_forward.npy)*\n",
121+
"\n",
122+
">**How do I extend this to my dataset?**\n",
123+
">\n",
124+
">*Inputs:* \n",
125+
">\n",
126+
">We need to store all input samples in a folder, e.g. ```/inputs/```. Each sample will be a numpy array with the name {SAMPLE_ID}.npy, of shape (#tracks, sequence length). In our case, #tracks = 4 (ATAC, H3K4me3, H3K27ac, H3K27me3) and sequence length = 10001 (bins of 100bp).\n",
127+
">\n",
128+
">*Targets:*\n",
129+
">\n",
130+
">Targets are provided as a table, where:\n",
131+
">- Columns are ID + name_of_output_1, name_of_ouput_2, etc. In our case we called them ID, -200_ctrl, -199_ctrl, etc.\n",
132+
">- Rows correspond to the sample ID (without .npy) and the table is filled with target values. In our case these were 1kbp read averages for 401 output nodes."
120133
]
121134
},
122135
{
123136
"cell_type": "code",
124-
"execution_count": null,
137+
"execution_count": 23,
125138
"metadata": {
126139
"tags": [
127140
"hide-input"
@@ -930,9 +943,68 @@
930943
},
931944
{
932945
"cell_type": "code",
933-
"execution_count": null,
946+
"execution_count": 22,
934947
"metadata": {},
935-
"outputs": [],
948+
"outputs": [
949+
{
950+
"name": "stdout",
951+
"output_type": "stream",
952+
"text": [
953+
"[Epoch 5/120] | loss-average: 0.2258 \n",
954+
"Epoch [6/120]: [1/36] 3%|▋ , loss-average=0.225 [00:00<?]Traceback (most recent call last):\n",
955+
" File \"/projects/rasmussen/data/enhancer_logic_project/claster_env/bin/eirtrain\", line 8, in <module>\n",
956+
" sys.exit(main())\n",
957+
" ^^^^^^\n",
958+
" File \"/maps/projects/rasmussen/data/enhancer_logic_project/claster_env/lib/python3.11/site-packages/eir/train.py\", line 77, in main\n",
959+
" run_experiment(experiment=default_experiment)\n",
960+
" File \"/maps/projects/rasmussen/data/enhancer_logic_project/claster_env/lib/python3.11/site-packages/eir/train.py\", line 353, in run_experiment\n",
961+
" train(experiment=experiment)\n",
962+
" File \"/maps/projects/rasmussen/data/enhancer_logic_project/claster_env/lib/python3.11/site-packages/eir/train.py\", line 375, in train\n",
963+
" trainer.run(data=exp.train_loader, max_epochs=gc.n_epochs)\n",
964+
" File \"/maps/projects/rasmussen/data/enhancer_logic_project/claster_env/lib/python3.11/site-packages/ignite/engine/engine.py\", line 898, in run\n",
965+
" return self._internal_run()\n",
966+
" ^^^^^^^^^^^^^^^^^^^^\n",
967+
" File \"/maps/projects/rasmussen/data/enhancer_logic_project/claster_env/lib/python3.11/site-packages/ignite/engine/engine.py\", line 941, in _internal_run\n",
968+
" return next(self._internal_run_generator)\n",
969+
" ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
970+
" File \"/maps/projects/rasmussen/data/enhancer_logic_project/claster_env/lib/python3.11/site-packages/ignite/engine/engine.py\", line 999, in _internal_run_as_gen\n",
971+
" self._handle_exception(e)\n",
972+
" File \"/maps/projects/rasmussen/data/enhancer_logic_project/claster_env/lib/python3.11/site-packages/ignite/engine/engine.py\", line 644, in _handle_exception\n",
973+
" raise e\n",
974+
" File \"/maps/projects/rasmussen/data/enhancer_logic_project/claster_env/lib/python3.11/site-packages/ignite/engine/engine.py\", line 965, in _internal_run_as_gen\n",
975+
" epoch_time_taken += yield from self._run_once_on_dataset_as_gen()\n",
976+
" ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
977+
" File \"/maps/projects/rasmussen/data/enhancer_logic_project/claster_env/lib/python3.11/site-packages/ignite/engine/engine.py\", line 1074, in _run_once_on_dataset_as_gen\n",
978+
" self.state.output = self._process_function(self, self.state.batch)\n",
979+
" ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
980+
" File \"/maps/projects/rasmussen/data/enhancer_logic_project/claster_env/lib/python3.11/site-packages/eir/train.py\", line 415, in step\n",
981+
" state = call_hooks_stage_iterable(\n",
982+
" ^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
983+
" File \"/maps/projects/rasmussen/data/enhancer_logic_project/claster_env/lib/python3.11/site-packages/eir/train_utils/utils.py\", line 130, in call_hooks_stage_iterable\n",
984+
" _, state = state_registered_hook_call(\n",
985+
" ^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
986+
" File \"/maps/projects/rasmussen/data/enhancer_logic_project/claster_env/lib/python3.11/site-packages/eir/train_utils/utils.py\", line 146, in state_registered_hook_call\n",
987+
" state_updates = hook_func(state=state, *args, **kwargs)\n",
988+
" ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
989+
" File \"/maps/projects/rasmussen/data/enhancer_logic_project/claster_env/lib/python3.11/site-packages/eir/train_utils/metrics.py\", line 536, in hook_add_uncertainty_loss\n",
990+
" cur_uncertainty_losses = cur_module(losses_dict=cur_loss_dict)\n",
991+
" ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
992+
" File \"/maps/projects/rasmussen/data/enhancer_logic_project/claster_env/lib/python3.11/site-packages/torch/nn/modules/module.py\", line 1511, in _wrapped_call_impl\n",
993+
"^C\n",
994+
" File \"/maps/projects/rasmussen/data/enhancer_logic_project/claster_env/lib/python3.11/site-packages/torch/nn/modules/module.py\", line 1520, in _call_impl\n",
995+
" return forward_call(*args, **kwargs)\n",
996+
" ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
997+
" File \"/maps/projects/rasmussen/data/enhancer_logic_project/claster_env/lib/python3.11/site-packages/eir/train_utils/metrics.py\", line 590, in forward\n",
998+
" loss_value_uncertain = self._calc_uncertainty_loss(\n",
999+
" ^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
1000+
" File \"/maps/projects/rasmussen/data/enhancer_logic_project/claster_env/lib/python3.11/site-packages/eir/train_utils/metrics.py\", line 582, in _calc_uncertainty_loss\n",
1001+
" loss = scalar * torch.sum(precision * loss_value + log_var)\n",
1002+
" ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
1003+
"KeyboardInterrupt\n",
1004+
"\u001b[0m"
1005+
]
1006+
}
1007+
],
9361008
"source": [
9371009
"! eirtrain \\\n",
9381010
"--global_configs ../configurations/conf_microc_rotated_pure_conv_tutorial/globals.yaml \\\n",

0 commit comments

Comments
 (0)