You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-[Multimodal Foundation Models](#multimodal-foundation-models)
52
52
-[Contrastive](#contrastive)
53
53
-[Generative](#generative)
54
54
-[Conversational](#conversational)
@@ -261,7 +261,7 @@ Self-supervised learning is a machine learning method where a model learns gener
261
261
|:-----------------:|:------------|
262
262
| <imgsrc="images/img53.png"width="900"> |<li> Title: <ahref="https://arxiv.org/pdf/2310.18689">Foundational models in medical imaging: A comprehensive survey and future vision</a></li> <li>Publication: Arxiv 2023 </li> <li>Summary: This survey provides an in-depth review of recent advancements in foundational models for medical imaging. It categorizes these models into four main groups, distinguishing between those prompted by text and those guided by visual cues. Each category presents unique strengths and capabilities, which are further explored through exemplary works and comprehensive methodological descriptions. Furthermore, this survey evaluates the advantages and limitations inherent to each model type, highlighting their areas of excellence while identifying aspects requiring improvement. </li> <li>Repo: <ahref="https://github.com/xmindflow/Awesome-Foundation-Models-in-Medical-Imaging">https://github.com/xmindflow/Awesome-Foundation-Models-in-Medical-Imaging</a>|
263
263
264
-
### Visual Prompted Models
264
+
### Visual Foundation Models
265
265
#### Interactive
266
266
>Interactive segmentation paradigm means the foundation model segments the target following the user-given prompts, such as a point, a bounding box (BBox), doodles or free text-like descriptions.
267
267
@@ -285,7 +285,7 @@ Self-supervised learning is a machine learning method where a model learns gener
285
285
| <imgsrc="images/img65.png"width="900"> |<li> Title: <ahref="https://arxiv.org/abs/2407.14153">ESP-MedSAM: Efficient Self-Prompting SAM for Universal Domain-Generalized Medical Image Segmentation</a></li> <li>Publication: Arxiv 2024</li> <li>Propose an efficient self-prompting SAM for universal domain-generalized medical image segmentation, named ESP-MedSAM. A multi-modal Decoupled Knowledge Distillation (MMDKD) strategy is first designed to construct a lightweight semi-parameter sharing image encoder that produces discriminative visual features for diverse modalities. Further, it introduces the Self-Patch Prompt Generator (SPPG) to generate high-quality dense prompt embeddings for guiding segmentation decoding automatically. Finally, it designed the Query-Decoupled Modality Decoder (QDMD) that leverages a one-to-one strategy to provide an independent decoding channel for every modality.</li> <li>Code: <ahref="https://github.com/xq141839/ESP-MedSAM">https://github.com/xq141839/ESP-MedSAM</a>|
286
286
| <imgsrc="images/img50.png"width="900"> |<li> Title: <ahref="https://openaccess.thecvf.com/content/ICCV2023/html/Butoi_UniverSeg_Universal_Medical_Image_Segmentation_ICCV_2023_paper.html">UniverSeg: Universal Medical Image Segmentation</a></li> <li>Publication: ICCV 2023</li> <li>Summary: Present UniverSeg, a universal segmentation method for solving unseen medical segmentation tasks without additional training. Given a query image and an example set of image-label pairs that define a new segmentation task, UniverSeg employs a new CrossBlock mechanism to produce accurate segmentation maps without additional training. What's more, 53 open-access medical segmentation datasets with over 22,000 scans were collected to train UniverSeg on a diverse set of anatomies and imaging modalities.</li> <li>Code: <ahref="https://universeg.csail.mit.edu">https://universeg.csail.mit.edu</a>|
287
287
288
-
### Textual Prompted Models
288
+
### Multimodal Foundation Models
289
289
290
290
#### Contrastive
291
291
>Contrastive textually prompted models are increasingly recognized as foundational models for medical imaging. They learn representations that capture the semantics and relationships between medical images and their corresponding textual prompts. By leveraging contrastive learning objectives, these models bring similar image-text pairs closer in the feature space while pushing dissimilar pairs apart.
@@ -297,6 +297,7 @@ Self-supervised learning is a machine learning method where a model learns gener
297
297
298
298
299
299
#### Generative
300
+
>Generative models represent another category within textually prompted models for medical imaging. These models are designed to generate realistic medical images based on textual prompts or descriptions. They utilize techniques such as variational autoencoders (VAEs) and generative adversarial networks (GANs) to learn the underlying distribution of medical images, enabling the creation of new samples that align with the provided prompts.
300
301
301
302
#### Conversational
302
303
>Conversational textually prompted models are designed to enable interactive dialogues between medical professionals and the model by fine-tuning foundational models on specific instruction sets. These models enhance communication and collaboration, allowing medical experts to ask questions, provide instructions, and seek explanations related to medical images.
0 commit comments