Start preparing release for new models (#1061)

anwai98 · constantinpape · web-flow · commit 1a6dc5e52aa7 · 2025-06-14T19:50:30.000+02:00
* Start preparing release for new models

* Add dataset and bioimageio docs and bump model version

* Bump diplomatic bug and noisy ox checksums

* Bump ideaslitc-rat and humorous-crab checksums

* Bump faithful-chicken and greedy-whale checksums

* Bump diplomatic-bug model version

* Bump more models

* Bump all models and add model download in info cli

* Update model download function

* Revert vit_t models to fix CI

* Update doc/bioimageio/em_organelles_v4.md

---------

Co-authored-by: Constantin Pape &lt;constantin.pape@informatik.uni-goettingen.de&gt;
diff --git a/doc/bioimageio/em_organelles_v4.md b/doc/bioimageio/em_organelles_v4.md
@@ -0,0 +1,19 @@
+# Segment Anything for Electron Microscopy
+
+This is a [Segment Anything](https://segment-anything.com/) model that was specialized for segmenting mitochondria and nuclei in electron microscopy with [micro_sam](https://github.com/computational-cell-analytics/micro-sam).
+This model uses a %s vision transformer as image encoder.
+
+Segment Anything is a model for interactive and automatic instance segmentation.
+We improve it for electron microscopy by finetuning on a large and diverse microscopy dataset.
+It should perform well for segmenting mitochondria and nuclei in electron microscopy. It can also work well for other organelles, but was not explicitly trained for this purpose. You may get better results for other organelles (e.g. ER or Golgi) with the default Segment Anything models.
+
+See [the dataset overview](https://github.com/computational-cell-analytics/micro-sam/blob/master/doc/datasets/em_organelles_v%i.md) for further informations on the training data and the [micro_sam documentation](https://computational-cell-analytics.github.io/micro-sam/micro_sam.html) for details on how to use the model for interactive and automatic segmentation.
+
+NOTE: The model's automatic instance segmentation quality has improved as the latest version updates the segmentation decoder architecture by replacing transposed convolutions with upsampling.
+
+
+## Validation
+
+The easiest way to validate the model is to visually check the segmentation quality for your data.
+If you have annotations you can use for validation you can also quantitative validation, see [here for details](https://computational-cell-analytics.github.io/micro-sam/micro_sam.html#9-how-can-i-evaluate-a-model-i-have-finetuned).
+Please note that the required quality for segmentation always depends on the analysis task you want to solve.
diff --git a/doc/bioimageio/lm_v4.md b/doc/bioimageio/lm_v4.md
@@ -0,0 +1,19 @@
+# Segment Anything for Light Microscopy
+
+This is a [Segment Anything](https://segment-anything.com/) model that was specialized for light microscopy with [micro_sam](https://github.com/computational-cell-analytics/micro-sam).
+This model uses a %s vision transformer as image encoder.
+
+Segment Anything is a model for interactive and automatic instance segmentation.
+We improve it for light microscopy by finetuning on a large and diverse microscopy dataset.
+It should perform well for cell and nucleus segmentation in fluorescent, label-free and other light microscopy datasets.
+
+See [the dataset overview](https://github.com/computational-cell-analytics/micro-sam/blob/master/doc/datasets/lm_v%i.md) for further informations on the training data and the [micro_sam documentation](https://computational-cell-analytics.github.io/micro-sam/micro_sam.html) for details on how to use the model for interactive and automatic segmentation.
+
+NOTE: The model's automatic instance segmentation quality has improved as the latest version updates the segmentation decoder architecture by replacing transposed convolutions with upsampling.
+
+
+## Validation
+
+The easiest way to validate the model is to visually check the segmentation quality for your data.
+If you have annotations you can use for validation you can also quantitative validation, see [here for details](https://computational-cell-analytics.github.io/micro-sam/micro_sam.html#9-how-can-i-evaluate-a-model-i-have-finetuned).
+Please note that the required quality for segmentation always depends on the analysis task you want to solve.
diff --git a/doc/datasets/em_organelles_v4.md b/doc/datasets/em_organelles_v4.md
@@ -0,0 +1,7 @@
+# Electron Microscopy Datasets
+
+The `EM Organelle v4` model was trained on three different electron microscopy datasets with segmentation annotations for mitochondria and nuclei:
+
+1. [MitoEM](https://mitoem.grand-challenge.org/): containing segmentation annotations for mitochondria in volume EM of human and rat cortex.
+2. [MitoLab](https://www.ebi.ac.uk/empiar/EMPIAR-11037/): containing segmentation annotations for mitochondria in different EM modalities.
+3. [Platynereis (Nuclei)](https://zenodo.org/records/3675220): contining segmentation annotations for nuclei in a blockface EM volume of *P. dumerilii*.
diff --git a/doc/datasets/lm_v4.md b/doc/datasets/lm_v4.md
@@ -0,0 +1,18 @@
+# Light Microscopy Datasets
+
+The `LM Generalist v4` model was trained on 14 different light microscopy datasets with segmentation annotations for cells and nuclei:
+
+1. [LIVECell](https://sartorius-research.github.io/LIVECell/): containing cell segmentation annotations for phase-contrast microscopy.
+2. [DeepBacs](https://github.com/HenriquesLab/DeepBacs): containing segmentation annotations for bacterial cells in different label-free microscopy modalities.
+3. [TissueNet](https://datasets.deepcell.org/): containing cell segmentation annotations in tissues imaged with fluorescence light microscopy.
+4. [PlantSeg (Root)](https://osf.io/2rszy/): containing cell segmentation annotations in plant roots imaged with fluorescence lightsheet microscopy.
+5. [NeurIPS CellSeg](https://neurips22-cellseg.grand-challenge.org/): containg cell segmentation annotations in phase-contrast, brightfield, DIC and fluorescence microscopy.
+6. [CTC (Cell Tracking Challenge)](https://celltrackingchallenge.net/2d-datasets/): containing cell segmentation annotations in different label-free and fluorescence microscopy settings. We make use of the following CTC datasets: `BF-C2DL-HSC`, `BF-C2DL-MuSC`, `DIC-C2DH-HeLa`, `Fluo-C2DL-Huh7`, `Fluo-C2DL-MSC`, `Fluo-N2DH-SIM+`, `PhC-C2DH-U373`, `PhC-C2DL-PSC"`]
+7. [DSB Nucleus Segmentation](https://www.kaggle.com/c/data-science-bowl-2018): containing nucleus segmentation annotations in fluorescence microscopy. We make use of [this subset](https://github.com/stardist/stardist/releases/download/0.1.0/dsb2018.zip) of the data.
+8. [EmbedSeg](https://github.com/juglab/EmbedSeg): containing cell and nuclei annotations in fluorescence microcsopy.
+9. [YeaZ](https://www.epfl.ch/labs/lpbs/data-and-software): containing segmentation annotations for yeast cells in phase contrast and brightfield microscopy.
+10. [CVZ Fluo](https://www.synapse.org/Synapse:syn27624812/): containing cell and nuclei annotations in fluorescence microsocopy.
+11. [DynamicNuclearNet](https://datasets.deepcell.org/): containing nuclei annotations in fluorescence microscopy.
+12. [CellPose](https://www.cellpose.org/): containing cell annotations in fluorescence microscopy.
+13. [OmniPose](https://osf.io/xmury/): containing segmentation annotations for bacterial cells in phase-contrast and fluorescence microscopy, and worms in brightfield microscopy.
+14. [OrgaSegment](https://zenodo.org/records/10278229): containing segmentation annotations for organoids in brightfield microscopy.
diff --git a/micro_sam/util.py b/micro_sam/util.py
@@ -105,12 +105,12 @@ def models():
         "vit_t": "xxh128:8eadbc88aeb9d8c7e0b4b60c3db48bd0",
         # The current version of our models in the modelzoo.
         # LM generalist models:
-        "vit_l_lm": "xxh128:fc32ea6f7fcc7eb02737d1304f81f5f2",
-        "vit_b_lm": "xxh128:8fd5806be3c3ba213e19a709d6d1495f",
+        "vit_l_lm": "xxh128:017f20677997d628426dec80a8018f9d",
+        "vit_b_lm": "xxh128:fe9252a29f3f4ea53c15a06de471e186",
         "vit_t_lm": "xxh128:72ec5074774761a6e5c05a08942f981e",
         # EM models:
-        "vit_l_em_organelles": "xxh128:096c9695966803ca6fde24f4c1e3c3fb",
-        "vit_b_em_organelles": "xxh128:f6f6593aeecd0e15a07bdac86360b6cc",
+        "vit_l_em_organelles": "xxh128:810b084b6e51acdbf760a993d8619f2d",
+        "vit_b_em_organelles": "xxh128:f3bf2ed83d691456bae2c3f9a05fb438",
         "vit_t_em_organelles": "xxh128:253474720c497cce605e57c9b1d18fd9",
         # Histopathology models:
         "vit_b_histopathology": "xxh128:ffd1a2cd84570458b257bd95fdd8f974",
@@ -122,12 +122,12 @@ def models():
     # Additional decoders for instance segmentation.
     decoder_registry = {
         # LM generalist models:
-        "vit_l_lm_decoder": "xxh128:779b5a50ecc6d46d495753fba8717f2f",
-        "vit_b_lm_decoder": "xxh128:9f580a96984b3085389ced5d9a4ae75d",
+        "vit_l_lm_decoder": "xxh128:2faeafa03819dfe03e7c46a44aaac64a",
+        "vit_b_lm_decoder": "xxh128:708b15ac620e235f90bb38612c4929ba",
         "vit_t_lm_decoder": "xxh128:3e914a5f397b0312cdd36813031f8823",
         # EM models:
-        "vit_l_em_organelles_decoder": "xxh128:d60fd96bd6060856f6430f29e42568fb",
-        "vit_b_em_organelles_decoder": "xxh128:b2d4dcffb99f76d83497d39ee500088f",
+        "vit_l_em_organelles_decoder": "xxh128:334877640bfdaaabce533e3252a17294",
+        "vit_b_em_organelles_decoder": "xxh128:bb6398956a6b0132c26b631c14f95ce2",
         "vit_t_em_organelles_decoder": "xxh128:8f897c7bb93174a4d1638827c4dd6f44",
         # Histopathology models:
         "vit_b_histopathology_decoder": "xxh128:6a66194dcb6e36199cbee2214ecf7213",
@@ -141,11 +141,11 @@ def models():
         "vit_h": "https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth",
         "vit_b": "https://dl.fbaipublicfiles.com/segment_anything/sam_vit_b_01ec64.pth",
         "vit_t": "https://owncloud.gwdg.de/index.php/s/TuDzuwVDHd1ZDnQ/download",
-        "vit_l_lm": "https://uk1s3.embassy.ebi.ac.uk/public-datasets/bioimage.io/idealistic-rat/1.1/files/vit_l.pt",
-        "vit_b_lm": "https://uk1s3.embassy.ebi.ac.uk/public-datasets/bioimage.io/diplomatic-bug/1.1/files/vit_b.pt",
+        "vit_l_lm": "https://uk1s3.embassy.ebi.ac.uk/public-datasets/bioimage.io/idealistic-rat/1.2/files/vit_l.pt",
+        "vit_b_lm": "https://uk1s3.embassy.ebi.ac.uk/public-datasets/bioimage.io/diplomatic-bug/1.2/files/vit_b.pt",
         "vit_t_lm": "https://uk1s3.embassy.ebi.ac.uk/public-datasets/bioimage.io/faithful-chicken/1.1/files/vit_t.pt",
-        "vit_l_em_organelles": "https://uk1s3.embassy.ebi.ac.uk/public-datasets/bioimage.io/humorous-crab/1/files/vit_l.pt",  # noqa
-        "vit_b_em_organelles": "https://uk1s3.embassy.ebi.ac.uk/public-datasets/bioimage.io/noisy-ox/1/files/vit_b.pt",
+        "vit_l_em_organelles": "https://uk1s3.embassy.ebi.ac.uk/public-datasets/bioimage.io/humorous-crab/1.2/files/vit_l.pt",  # noqa
+        "vit_b_em_organelles": "https://uk1s3.embassy.ebi.ac.uk/public-datasets/bioimage.io/noisy-ox/1.2/files/vit_b.pt",  # noqa
         "vit_t_em_organelles": "https://uk1s3.embassy.ebi.ac.uk/public-datasets/bioimage.io/greedy-whale/1/files/vit_t.pt",  # noqa
         "vit_b_histopathology": "https://owncloud.gwdg.de/index.php/s/sBB4H8CTmIoBZsQ/download",
         "vit_l_histopathology": "https://owncloud.gwdg.de/index.php/s/IZgnn1cpBq2PHod/download",
@@ -154,11 +154,11 @@ def models():
     }
 
     decoder_urls = {
-        "vit_l_lm_decoder": "https://uk1s3.embassy.ebi.ac.uk/public-datasets/bioimage.io/idealistic-rat/1.1/files/vit_l_decoder.pt",  # noqa
-        "vit_b_lm_decoder": "https://uk1s3.embassy.ebi.ac.uk/public-datasets/bioimage.io/diplomatic-bug/1.1/files/vit_b_decoder.pt",  # noqa
+        "vit_l_lm_decoder": "https://uk1s3.embassy.ebi.ac.uk/public-datasets/bioimage.io/idealistic-rat/1.2/files/vit_l_decoder.pt",  # noqa
+        "vit_b_lm_decoder": "https://uk1s3.embassy.ebi.ac.uk/public-datasets/bioimage.io/diplomatic-bug/1.2/files/vit_b_decoder.pt",  # noqa
         "vit_t_lm_decoder": "https://uk1s3.embassy.ebi.ac.uk/public-datasets/bioimage.io/faithful-chicken/1.1/files/vit_t_decoder.pt",  # noqa
-        "vit_l_em_organelles_decoder": "https://uk1s3.embassy.ebi.ac.uk/public-datasets/bioimage.io/humorous-crab/1/files/vit_l_decoder.pt",  # noqa
-        "vit_b_em_organelles_decoder": "https://uk1s3.embassy.ebi.ac.uk/public-datasets/bioimage.io/noisy-ox/1/files/vit_b_decoder.pt",  # noqa
+        "vit_l_em_organelles_decoder": "https://uk1s3.embassy.ebi.ac.uk/public-datasets/bioimage.io/humorous-crab/1.2/files/vit_l_decoder.pt",  # noqa
+        "vit_b_em_organelles_decoder": "https://uk1s3.embassy.ebi.ac.uk/public-datasets/bioimage.io/noisy-ox/1.2/files/vit_b_decoder.pt",  # noqa
         "vit_t_em_organelles_decoder": "https://uk1s3.embassy.ebi.ac.uk/public-datasets/bioimage.io/greedy-whale/1/files/vit_t_decoder.pt",  # noqa
         "vit_b_histopathology_decoder": "https://owncloud.gwdg.de/index.php/s/KO9AWqynI7SFOBj/download",
         "vit_l_histopathology_decoder": "https://owncloud.gwdg.de/index.php/s/oIs6VSmkOp7XrKF/download",
@@ -283,6 +283,31 @@ def _load_checkpoint(checkpoint_path):
     return state, model_state
 
 
+def _download_sam_model(model_type, progress_bar_factory=None):
+    model_registry = models()
+
+    progress_bar = True
+    # Check if we have to download the model.
+    # If we do and have a progress bar factory, then we over-write the progress bar.
+    if not os.path.exists(os.path.join(get_cache_directory(), model_type)) and progress_bar_factory is not None:
+        progress_bar = progress_bar_factory(model_type)
+
+    checkpoint_path = model_registry.fetch(model_type, progressbar=progress_bar)
+    if not isinstance(progress_bar, bool):  # Close the progress bar when the task finishes.
+        progress_bar.close()
+
+    model_hash = model_registry.registry[model_type]
+
+    # If we have a custom model then we may also have a decoder checkpoint.
+    # Download it here, so that we can add it to the state.
+    decoder_name = f"{model_type}_decoder"
+    decoder_path = model_registry.fetch(
+        decoder_name, progressbar=True
+    ) if decoder_name in model_registry.registry else None
+
+    return checkpoint_path, model_hash, decoder_path
+
+
 def get_sam_model(
     model_type: str = _DEFAULT_MODEL,
     device: Optional[Union[str, torch.device]] = None,
@@ -345,26 +370,7 @@ def get_sam_model(
     # URL from the model_type. If the model_type is invalid pooch will raise an error.
     _provided_checkpoint_path = checkpoint_path is not None
     if checkpoint_path is None:
-        model_registry = models()
-
-        progress_bar = True
-        # Check if we have to download the model.
-        # If we do and have a progress bar factory, then we over-write the progress bar.
-        if not os.path.exists(os.path.join(get_cache_directory(), model_type)) and progress_bar_factory is not None:
-            progress_bar = progress_bar_factory(model_type)
-
-        checkpoint_path = model_registry.fetch(model_type, progressbar=progress_bar)
-        if not isinstance(progress_bar, bool):  # Close the progress bar when the task finishes.
-            progress_bar.close()
-
-        model_hash = model_registry.registry[model_type]
-
-        # If we have a custom model then we may also have a decoder checkpoint.
-        # Download it here, so that we can add it to the state.
-        decoder_name = f"{model_type}_decoder"
-        decoder_path = model_registry.fetch(
-            decoder_name, progressbar=True
-        ) if decoder_name in model_registry.registry else None
+        checkpoint_path, model_hash, decoder_path = _download_sam_model(model_type, progress_bar_factory)
 
     # checkpoint_path has been passed, we use it instead of downloading a model.
     else:
@@ -1259,13 +1265,25 @@ def micro_sam_info():
     """Display μSAM information using a rich console."""
     import psutil
     import platform
+    import argparse
+    from rich import progress
     from rich.panel import Panel
     from rich.table import Table
     from rich.console import Console
 
     import torch
     import micro_sam
 
+    parser = argparse.ArgumentParser(description="μSAM Information Booth")
+    parser.add_argument(
+        "--download", nargs="+", metavar=("WHAT", "KIND"),
+        help="Downloads the pretrained SAM models."
+        "'--download models' -> downloads all pretrained models; "
+        "'--download models vit_b_lm vit_b_em_organelles' -> downloads the listed models; "
+        "'--download model/models vit_b_lm' -> downloads a single specified model."
+    )
+    args = parser.parse_args()
+
     # Open up a new console.
     console = Console()
 
@@ -1339,3 +1357,38 @@ def micro_sam_info():
                 title="Device Information"
             )
         )
+
+    # The section allowing to download models.
+    # NOTE: In future, can be extended to download sample data.
+    if args.download:
+        download_provided_args = [t.lower() for t in args.download]
+        mode, *model_types = download_provided_args
+
+        if mode not in {"models", "model"}:
+            console.print(f"[red]Unknown option for --download: {mode}[/]")
+            return
+
+        if mode in ["model", "models"] and not model_types:  # If user did not specify, we will download all models.
+            download_list = available_models
+        else:
+            download_list = model_types
+            incorrect_models = [m for m in download_list if m not in available_models]
+            if incorrect_models:
+                console.print(Panel("[red]Unknown model(s):[/] " + ", ".join(incorrect_models), title="Download Error"))
+                return
+
+        with progress.Progress(
+            progress.SpinnerColumn(),
+            progress.TextColumn("[progress.description]{task.description}"),
+            progress.BarColumn(bar_width=None),
+            "[progress.percentage]{task.percentage:>3.0f}%",
+            progress.TimeRemainingColumn(),
+            console=console,
+        ) as prog:
+            task = prog.add_task("[green]Downloading μSAM models…", total=len(download_list))
+            for model_type in download_list:
+                prog.update(task, description=f"Downloading [cyan]{model_type}[/]…")
+                _download_sam_model(model_type=model_type)
+                prog.advance(task)
+
+        console.print(Panel("[bold green] Downloads complete![/]", title="Finished"))
diff --git a/scripts/model_export/export_models.py b/scripts/model_export/export_models.py
@@ -22,7 +22,7 @@
 
 INPUT_FOLDER = "/media/anwai/ANWAI/models/micro_sam"
 OUTPUT_FOLDER = "./exported_models"
-BIOIMAGEIO_VERSION = 1.1  # version marked for v3 (LM) Generalist Models
+BIOIMAGEIO_VERSION = 1.2  # version marked for v4 LM and EM-Organelles Generalist Models
 
 
 def create_doc(model_type, modality, version):
@@ -131,13 +131,18 @@ def export_model(model_path, model_type, modality, version, email):
     print("Decoder:")
     print(f"{model_name}_decoder", f"xxh128:{decoder_checksum}")
 
+    breakpoint()
 
-def export_all_models(email, version):
-    models = glob(os.path.join(INPUT_FOLDER, f"v{version}/**/vit*"), recursive=True)
+
+def export_all_models(email, version, model_type):
+    if model_type is None:
+        model_type = "vit*"
+
+    models = glob(os.path.join(INPUT_FOLDER, f"v{version}/**/{model_type}"), recursive=True)
     for path in models:
-        modality, _, model_type = path.split("/")[-3:]  # current expected structure: v3/lm/generalist/vit_b/best.pt
-        # print(model_path, modality, model_type)
+        modality, _, model_type = path.split("/")[-3:]  # current expected structure: v4/lm/generalist/vit_b/best.pt
         model_path = os.path.join(path, "best.pt")
+        print(model_path, modality, model_type)
         assert os.path.exists(model_path), model_path
         export_model(model_path, model_type, modality, version=version, email=email)
 
@@ -146,16 +151,17 @@ def export_all_models(email, version):
 def export_vit_t_lm(email):
     model_type = "vit_t"
     model_path = os.path.join(INPUT_FOLDER, "lm", "generalist", model_type, "best.pt")
-    export_model(model_path, model_type, "lm", version=3, email=email)
+    export_model(model_path, model_type, "lm", version=4, email=email)
 
 
 def main():
     parser = argparse.ArgumentParser()
     parser.add_argument("-e", "--email", required=True)
-    parser.add_argument("-v", "--version", default=3, type=int)
+    parser.add_argument("-v", "--version", default=4, type=int)
+    parser.add_argument("-m", "--model_type", type=str, default=None)
     args = parser.parse_args()
 
-    export_all_models(args.email, args.version)
+    export_all_models(args.email, args.version, args.model_type)
 
 
 if __name__ == "__main__":
diff --git a/scripts/model_export/models.py b/scripts/model_export/models.py
@@ -5,6 +5,7 @@
 import requests
 import yaml
 
+
 ADDJECTIVE_URL = "https://raw.githubusercontent.com/bioimage-io/collection-bioimage-io/main/adjectives.txt"
 ANIMAL_URL = "https://raw.githubusercontent.com/bioimage-io/collection-bioimage-io/main/animals.yaml"
 COLLECTION_URL = "https://raw.githubusercontent.com/bioimage-io/collection-bioimage-io/gh-pages/collection.json"
diff --git a/scripts/plotting/for_release/run_inference_for_new_benchmarking.py b/scripts/plotting/for_release/run_inference_for_new_benchmarking.py