Merge pull request #130 from computational-cell-analytics/revision2

constantinpape · web-flow · commit 928f330a70fc · 2025-07-11T10:48:43.000+02:00
Update vesicle inference
diff --git a/doc/start_page.md b/doc/start_page.md
@@ -147,10 +147,12 @@ For more options supported by the IMOD exports, please run `synapse_net.export_t
 
 > Note: to use these commands you have to install IMOD.
 
+SynapseNet also provides two CLI comamnds for training models, one for supervised network training (see [Supervised Training](#supervised-training) for details) and one for domain adaptation (see [Domain Adaptation](#domain-adaptation) for details).
+
 
 ## Python Library
 
-Using the `synapse_net` python library offers the most flexibility for using the SynapseNet functionality.
+Using the `synapse_net` python library offers the most flexibility for using SynapseNet's functionality.
 You can find an example analysis pipeline implemented with SynapseNet [here](https://github.com/computational-cell-analytics/synapse-net/blob/main/examples/analysis_pipeline.py).
 
 We offer different functionality for segmenting and analyzing synapses in electron microscopy:
@@ -161,21 +163,62 @@ We offer different functionality for segmenting and analyzing synapses in electr
 
 Please refer to the module documentation below for a full overview of our library's functionality.
 
+### Supervised Training
+
+SynapseNet provides functionality for training a UNet for segmentation tasks using supervised learning.
+In this case, you have to provide data **and** (manual) annotations for the structure(s) you want to segment.
+This functionality is implemented in `synapse_net.training.supervised_training`. You can find an example script that shows how to use it [here](https://github.com/computational-cell-analytics/synapse-net/blob/main/examples/network_training.py).
+
+We also provide a command line function to run supervised training: `synapse_net.run_supervised_training`. Run
+```bash
+synapse_net.run_supervised_training -h
+```
+for more information and instructions on how to use it.
+
 ### Domain Adaptation
 
-We provide functionality for domain adaptation. It implements a special form of neural network training that can improve segmentation for data from a different condition (e.g. different sample preparation, electron microscopy technique or different specimen), **without requiring additional annotated structures**.
+SynapseNet provides functionality for (unsupervised) domain adaptation.
+This functionality is implemented through a student-teacher training approach that can improve segmentation for data from a different condition (for example different sample preparation, imaging technique, or different specimen), **without requiring additional annotated structures**.
 Domain adaptation is implemented in `synapse_net.training.domain_adaptation`. You can find an example script that shows how to use it [here](https://github.com/computational-cell-analytics/synapse-net/blob/main/examples/domain_adaptation.py).
 
-> Note: Domain adaptation only works if the initial model you adapt already finds some of the structures in the data from a new condition. If it does not work you will have to train a network on annotated data.
+We also provide a command line function to run domain adaptation: `synapse_net.run_domain_adaptation`. Run
+```bash
+synapse_net.run_domain_adaptation -h
+```
+for more information and instructions on how to use it.
 
-### Network Training
+> Note: Domain adaptation only works if the initial model already finds some of the structures in the data from a new condition. If it does not work you will have to train a network on annotated data.
 
-We also provide functionality for 'regular' neural network training. In this case, you have to provide data **and** manual annotations for the structure(s) you want to segment.
-This functionality is implemented in `synapse_net.training.supervised_training`. You can find an example script that shows how to use it [here](https://github.com/computational-cell-analytics/synapse-net/blob/main/examples/network_training.py).
 
 ## Segmentation for the CryoET Data Portal
 
 We have published segmentation results for tomograms of synapses stored in the [CryoET Data Portal](https://cryoetdataportal.czscience.com/). So far we have made the following depositions:
 - [CZCDP-10330](https://cryoetdataportal.czscience.com/depositions/10330): Contains synaptic vesicle segmentations for over 50 tomograms of synaptosomes. The segmentations were made with a model domain adapted to the synaptosome tomograms.
 
 The scripts for the submissions can be found in [scripts/cryo/cryo-et-portal](https://github.com/computational-cell-analytics/synapse-net/tree/main/scripts/cryo/cryo-et-portal).
+
+
+## Community Data Submission
+
+We are looking to extend and improve the SynapseNet models by training on more annotated data from electron tomography or (volume) electron microscopy.
+For this, we plan to collect data from community submissions.
+
+If you are using SynapseNet for a task where it does not perform well, or if you would like to use it for a new segmentation task not offered by it, and have annotations for your data, then you can submit this data to us, so that we can use it to train our next version of improved models.
+To do this, please create an [issue on github](https://github.com/computational-cell-analytics/synapse-net/issues) and:
+- Use a title "Data submission: ..." ("..." should be a title for your data, e.g. "smooth ER in electron tomography")
+- Briefly describe your data and add an image that shows the microscopy data and the segmentation masks you have.
+- Make sure to describe:
+    - The imaging modality and the structure(s) that you have segmented.
+    - How many images and annotations you have / can submit and how you have created the annotations.
+        - You should submit at least 5 images or crops and 20 annotated objects. If you are unsure if you have enough data please go ahead and create the issue / post and we can discuss the details.
+    - Which data-format your images and annotations are stored in. We recommend using either `tif`, `mrc`, or `ome.zarr` files.
+- Please indicate that you are willing to share the data for training purpose (see also next paragraph).
+
+Once you have created the post / issue, we will check if your data is suitable for submission or discuss with you how it could be extended to be suitable. Then:
+- We will share an agreement for data sharing. You can find **a draft** [here](https://docs.google.com/document/d/1vf5Efp5EJcS1ivuWM4f3pO5kBqEZfJcXucXL5ot0eqg/edit?usp=sharing).
+- You will be able to choose how you want to submit / publish your data.
+    - Share it under a CC0 license. In this case, we will use the data for re-training and also make it publicly available as soon as the next model versions become available.
+    - Share it for training with the option to publish it later. For example, if your data is unpublished and you want to only published once the respective publication is available. In this case, we will use the data for re-training, but not make it freely available yet. We will check with you peridiodically to see if your data can now be published.
+    - Share it for training only. In this case, we will re-train the model on it, but not make it publicly available.
+- We encourage you to choose the first option (making the data available under CC0).
+- We will then send you a link to upload your data, after you have agreed to these terms.
diff --git a/scripts/cooper/revision/az_prediction.py b/scripts/cooper/revision/az_prediction.py
@@ -24,7 +24,7 @@ def run_prediction(model, name, split_folder, version, split_names, in_path):
 
     for fname in tqdm(file_names):
         if in_path:
-            input_path=os.path.join(in_path, name, fname)
+            input_path = os.path.join(in_path, name, fname)
         else:
             input_path = os.path.join(INPUT_ROOT, name, fname)
         print(f"segmenting {input_path}")
@@ -50,15 +50,14 @@ def run_prediction(model, name, split_folder, version, split_names, in_path):
                 print(f"{output_key_seg} already saved")
             else:
                 f.create_dataset(output_key_seg, data=seg, compression="lzf")
-                
 
 
 def get_model(version):
     assert version in (3, 4, 5, 6, 7)
     split_folder = get_split_folder(version)
     if version == 3:
         model_path = os.path.join(split_folder, "checkpoints", "3D-AZ-model-TEM_STEM_ChemFix_wichmann-v3")
-    elif version ==6:
+    elif version == 6:
         model_path = "/mnt/ceph-hdd/cold/nim00007/models/AZ/v6/"
     elif version == 7:
         model_path = "/mnt/lustre-emmy-hdd/usr/u12095/synapse_net/models/ConstantinAZ/checkpoints/v7/"
@@ -79,15 +78,15 @@ def main():
     args = parser.parse_args()
 
     if args.model_path:
-        model = load_model(model_path)
+        model = load_model(args.model_path)
     else:
         model = get_model(args.version)
 
     split_folder = get_split_folder(args.version)
 
     for name in args.names:
         run_prediction(model, name, split_folder, args.version, args.splits, args.input)
-    
+
     print("Finished segmenting!")
 
 
diff --git a/scripts/cooper/revision/common.py b/scripts/cooper/revision/common.py
@@ -65,7 +65,7 @@ def get_split_folder(version):
     if version == 3:
         split_folder = "splits"
     elif version == 6:
-        split_folder= "/mnt/ceph-hdd/cold/nim00007/new_AZ_train_data/splits"
+        split_folder = "/mnt/ceph-hdd/cold/nim00007/new_AZ_train_data/splits"
     else:
         split_folder = "models_az_thin"
     return split_folder
diff --git a/setup.py b/setup.py
@@ -16,6 +16,8 @@
             "synapse_net.run_segmentation = synapse_net.tools.cli:segmentation_cli",
             "synapse_net.export_to_imod_points = synapse_net.tools.cli:imod_point_cli",
             "synapse_net.export_to_imod_objects = synapse_net.tools.cli:imod_object_cli",
+            "synapse_net.run_supervised_training = synapse_net.training.supervised_training:main",
+            "synapse_net.run_domain_adaptation = synapse_net.training.domain_adaptation:main",
         ],
         "napari.manifest": [
             "synapse_net = synapse_net:napari.yaml",
diff --git a/synapse_net/inference/vesicles.py b/synapse_net/inference/vesicles.py
@@ -1,4 +1,5 @@
 import time
+import warnings
 from typing import Dict, List, Optional, Tuple, Union
 
 import elf.parallel as parallel
@@ -8,6 +9,7 @@
 
 from synapse_net.inference.util import apply_size_filter, get_prediction, _Scaler
 from synapse_net.inference.postprocessing.vesicles import filter_border_objects, filter_border_vesicles
+from skimage.segmentation import relabel_sequential
 
 
 def distance_based_vesicle_segmentation(
@@ -148,6 +150,10 @@ def segment_vesicles(
         return_predictions: Whether to return the predictions (foreground, boundaries) alongside the segmentation.
         scale: The scale factor to use for rescaling the input volume before prediction.
         exclude_boundary: Whether to exclude vesicles that touch the upper / lower border in z.
+        exclude_boundary_vesicles: Whether to exlude vesicles on the boundary that have less than the full diameter
+            inside of the volume. This is an alternative to post-processing with `exclude_boundary` that filters
+            out less vesicles at the boundary and is better suited for volumes with small context in z.
+            If `exclude_boundary` is also set to True, then this option will have no effect.
         mask: An optional mask that is used to restrict the segmentation.
 
     Returns:
@@ -181,26 +187,23 @@ def segment_vesicles(
             foreground, boundaries, verbose=verbose, min_size=min_size, **kwargs
         )
 
-    if exclude_boundary:
+    if exclude_boundary and exclude_boundary_vesicles:
+        warnings.warn(
+            "You have set both 'exclude_boundary' and 'exclude_boundary_vesicles' to True."
+            "The 'exclude_boundary_vesicles' option will have no effect."
+        )
         seg = filter_border_objects(seg)
-    if exclude_boundary_vesicles:
-        seg_ids = filter_border_vesicles(seg)
-        # Step 1: Zero out everything not in seg_ids
-        seg[~np.isin(seg, seg_ids)] = 0
-
-        # Step 2: Relabel remaining IDs to be consecutive starting from 1
-        unique_ids = np.unique(seg)
-        unique_ids = unique_ids[unique_ids != 0]  # Exclude background (0)
 
-        label_map = {old_label: new_label for new_label, old_label in enumerate(unique_ids, start=1)}
+    elif exclude_boundary:
+        seg = filter_border_objects(seg)
 
-        # Apply relabeling using a temp array (to avoid large ints in-place)
-        new_seg = np.zeros_like(seg, dtype=np.int32)
-        for old_label, new_label in label_map.items():
-            new_seg[seg == old_label] = new_label
+    elif exclude_boundary_vesicles:
+        # Filter the vesicles that are at the z-border with less than their full diameter.
+        seg_ids = filter_border_vesicles(seg)
 
-        # Final step: replace original seg with relabelled and casted version
-        seg = new_seg
+        # Remove everything not in seg ids and relable the remaining IDs consecutively.
+        seg[~np.isin(seg, seg_ids)] = 0
+        seg = relabel_sequential(seg)[0]
 
     seg = scaler.rescale_output(seg, is_segmentation=True)
 
diff --git a/synapse_net/training/domain_adaptation.py b/synapse_net/training/domain_adaptation.py
diff --git a/synapse_net/training/supervised_training.py b/synapse_net/training/supervised_training.py