Skip to content

Ziwei-Niu/Generalized_MedIA

Repository files navigation

Awesome Domain Generalization and Foundation Model in Medical Image Analysis

🔥 This is a repository for organizing papers ,codes, and etc related to Domain Generalization and Foundation model in Medical Image Analysis (DG&FM in MedIA).

💗 Medical Image Analysis (MedIA) plays a critical role in computer aided diagnosis system, enabling accurate diagnosis and assessment for various diseases. Over the last decade, deep learning (DL) has demonstrated great success in automating various MedIA tasks such as disease diagnosis, lesion segmentation, prognosis prediction, etc. Despite their success, in many real-world healthcare scenarios, the difference in the image acquisition, such as device manufacturer, scanning protocol, image sequence, and modality, introduces domain shifts, resulting in a significant decline in performance when deploying the well-trained model to clinical sites with different data distributions. Therefore, enhancing the generalization ability of DL models in MedIA is crucial in both clinical and academic fields. Domain generalization (DG), as an effective method to improve the generalization performance of task-specific models, can effectively mitigate the performance degradation caused by domain shifts in medical images, such as cross-center, cross-sequence, and cross-modality variations. Recently, with the explosive growth of data and advancements in computational resources, Foundation Model (FM) has addressed the domain shift issue in a more direct manner by collecting a vast amount of diverse data, effectively preventing domain shifts at the source. It can handle a wide variety of tasks, including entirely new tasks that it has never encountered before. However, compared to task-specific DG models, FM offers increased task diversity and flexibility. Nonetheless, challenges such as medical data privacy concerns, data-sharing restrictions, the need for manual annotations by medical experts, and deployment demands persist. Therefore, we maintain that both DG and FM have their own merits and continue to hold significant research value.

🎯 We hope that this repository can provide assistance to researchers and practitioners in medical image analysis, domain generalization and foundation models.

🚀 New Updates:

  • 06/01/2025 : We have modified the presentation format to a table, which makes it easier for readers to review.
  • 25/12/2024 : We have added a Universal Segmentation Foundational Model branch.
  • 08/02/2024 : We released this repo for organizing papers ,codes, and etc related to domain generalization for medical image analysis.

Table of Contents

Papers on Domian Generalization (ongoing)

Data Manipulation Level

Data Augmentation

Augmentation is widely employed in vision tasks to mitigate overfitting and improve generalization capacity, including operations like flipping, cropping, color jittering, noise addition, and others. For domain generalization in medical image analysis, augmentation methods can be broadly categorized as randomization-based, adversarial-based, and normalization-based.

Normalization-based

Normalization-based methods aims to normalize the raw intensity values or statistics to reduce the impact of variations in image intensity across different domains. Specifically, these methods are usually employed for specific tasks, such as pathological images testtest.

Diagram Descriptions
  • Title: Generative models for color normalization in digital pathology and dermatology: Advancing the learning paradigm
  • Publication: Expert Systems with Applications 2024
  • Summary: Formulate the color normalization task as an image-to-image translation problem, ensuring a pixel-to-pixel correspondence between the original and normalized images.
  • Title: Improved Domain Generalization for Cell Detection in Histopathology Images via Test-Time Stain Augmentation
  • Publication: MICCAI 2022
  • Summary: Propose a test-time stain normalization method for cell detection in histopathology images, which transforms the test images by mixing their stain color with that of the source domain, so that the mixed images may better resemble the source images or their stain-transformed versions used for training.
  • Title: Tackling Mitosis Domain Generalization in Histopathology Images with Color Normalization
  • Publication: MICCAI Challenge 2022
  • Summary: Employ a color normalization method in their architecture for mitosis detection in histopathology images.
  • Title: Improve Unseen Domain Generalization via Enhanced Local Color Transformation
  • Publication: MICCAI 2020
  • Summary: Propose Enhanced Domain Transformation (EDT) for diabetic retinopathy classification, which aims to project the images into a color space that aligns the distribution of source data and unseen target data.
  • Randomization-based

    The goal of randomization-based methods is to generate novel input data by applying random transformations to the image-space, frequency-space and feature space.

    Image-space

    Diagram Descriptions
  • Title: Rethinking Data Augmentation for Single-Source Domain Generalization in Medical Image Segmentation
  • Publication: AAAI 2023
  • Summary: Rethink the data augmentation strategy for DG in medical image segmentation and propose a location-scale augmentation strategy, which performs constrained Bezier transformation on both global and local (i.e. class-level) regions to enrich the informativeness and diversity of augmented.
  • Code: https://github.com/Kaiseem/SLAug
  • Title: Generalizable Cross-modality Medical Image Segmentation via Style Augmentation and Dual Normalization
  • Publication: CVPR 2022
  • Summary: Employ Bezier Curves to augment single source domain into different styles and split them into source-similar domain and source-dissimilar domain.
  • Code: https://github.com/zzzqzhou/Dual-Normalization
  • Title: Generalizing Deep Learning for Medical Image Segmentation to Unseen Domains via Deep Stacked Transformation
  • Publication: IEEE TMI 2020
  • Summary: Propose a deep stacked transformation approach by applying extensive random typical transformations on a single source domain to simulate the domain shift.
  • Frequency-space

    Diagram Descriptions
  • Title: Frequency-Mixed Single-Source Domain Generalization for Medical Image Segmentation
  • Publication: MICCAI 2023
  • Summary: Present FMAug that extends the domain margin by mixing patches from diverse frequency views.
  • Code: https://github.com/liamheng/Non-IID_Medical_Image_Segmentation
  • Title: Fourier-based augmentation with applications to domain generalization
  • Publication: Pattern Recognition 2023
  • Summary: Propose a Fourier-based data augmentation strategy called AmpMix by linearly interpolating the amplitudes of two images while keeping their phases unchanged to simulated domain shift. Additionally a consistency training between different augmentation views is incorporated to learn invariant representation.
  • Code: https://github.com/MediaBrain-SJTU/FACT
  • Title: Generalizable Medical Image Segmentation via Random Amplitude Mixup and Domain-Specific Image Restoration
  • Publication: ECCV 2022
  • Summary: Present a continuous frequency space interpolation mechanism for cross-site fundus and prostate segmentation, which exchanges amplitude spectrum (style) to generate new samples while keeping the phase spectrum (semantic)
  • Code: https://github.com/zzzqzhou/RAM-DSIR
  • Title: Domain Generalization in Restoration of Cataract Fundus Images Via High-Frequency Components
  • Publication: ISBI 2022
  • Summary: Cataract-like fundus images are randomly synthesized from an identical clear image by adding cataractous blurry. Then, high-frequency components are extracted from the cataract-like images to reduce the domain shift and achieve domain alignment.
  • Code: https://github.com/HeverLaw/Restoration-of-Cataract-Images-via-Domain-Generalization
  • Title: FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space
  • Publication: CVPR 2021
  • Summary: Propose a continuous frequency space interpolation mechanism for federated medical domain generalization, which exchanges amplitude spectrum across clients to transmit the distribution information, while keeping the phase spectrum with core semantics locally for privacy protection.
  • Code: https://github.com/liuquande/FedDG-ELCFS
  • Feature-space

    Diagram Descriptions
  • Title: Improving the Generalizability of Convolutional Neural Network-Based Segmentation on CMR Images
  • Publication: Frontiers in Cardiovascular Medicine 2020
  • Summary: Propose a simple yet effective way for improving the network generalization ability by carefully designing data normalization and augmentation strategies.
  • Adversarial-based

    Adversarial-based data augmentation methods are driven by adversarial training, aiming to maximize the diversity of data while simultaneously constraining its reliability.

    Diagram Descriptions
  • Title: AADG: Automatic Augmentation for Domain Generalization on Retinal Image Segmentation
  • Publication: TMI 2022
  • Summary: Introduce a novel proxy task maximizing the diversity among multiple augmented novel domains as measured by the Sinkhorn distance in a unit sphere space to achieve automated augmentation. Adversarial training and deep reinforcement learning are employed to efficiently search the objectives.
  • Code: https://github.com/CRazorback/AADG
  • Title: Adversarial Consistency for Single Domain Generalization in Medical Image Segmentation
  • Publication: MICCAI 2022
  • Summary: Synthesize the new domains via learning an adversarial domain synthesizer (ADS), and propose to keep the underlying semantic information between the source image and the synthetic image via a mutual information regularizer.
  • Title: MaxStyle: Adversarial Style Composition for Robust Medical Image Segmentation
  • Publication: MICCAI 2022
  • Summary: Propose a data augmentation framework called MaxStyle, which augments data with improved image style diversity and hardness, by expanding the style space with noise and searching for the worst-case style composition of latent features via adversarial training.
  • Code:https://github.com/cherise215/MaxStyle
  • Title: Domain Generalization with Adversarial Intensity Attack for Medical Image Segmentation
  • Publication: Arxiv 2023
  • Summary: Propose Adversarial Intensity Attack (AdverIN) that introduce an adversarial attack on the data intensity distribution, which leverages adversarial training to generate training data with an infinite number of styles and increase data diversity while preserving essential content information.
  • Title: TeSLA: Test-Time Self-Learning With Automatic Adversarial Augmentation
  • Publication: CVPR 2023
  • Summary: Propose a method that combines knowledge distillation with adversarial-based data augmentation for cross-site medical image segmentation tasks.
  • Code:https://github.com/devavratTomar/TeSLA
  • Data Generation

    Data generation is devoted to utilizing generative models such as Variational Autoencoder (VAE), Generative Adversarial Networks (GANs), Diffusion Models and etc., to generate fictional and novel samples. With source domain data becoming more complex, diverse, and informative, the generalization ability can be increased.

    Diagram Descriptions
  • Title: GH-DDM: the generalized hybrid denoising diffusion model for medical image generation
  • Publication: Multimedia Systems 2023
  • Summary: Introduce a generalized hybrid denoising diffusion model to enhance generalization ability by generating new cross-domain medical images, which leverages the strong abilities of transformers into diffusion models to model long-range interactions and spatial relationships between anatomical structures.
  • Title: Test-Time Image-to-Image Translation Ensembling Improves Out-of-Distribution Generalization in Histopathology
  • Publication: MICCAI 2022
  • Summary: Utilize multi-domain image-to-image translation model StarGanV2 and projects histopathology test images from unseen domains to the source domains, classify the projected images and ensemble their predictions.
  • Code:https://gitlab.com/vitadx/articles/test-time-i2i-translation-ensembling
  • Title: Domain Generalization for Retinal Vessel Segmentation with Vector Field Transformer
  • Publication: PMLR 2022
  • Summary: Apply auto-encoder to generate different styles of enhanced vessel maps for augmentation and uses Hessian matrices of an image for segmentation as vector fields better capture the morphological features and suffer less from covariate shift.
  • Code:https://github.com/MedICL-VU/Vector-Field-Transformer
  • Title: CIRCLe: Color Invariant Representation Learning for Unbiased Classification of Skin Lesions
  • Publication: ECCV Workshop 2022
  • Summary: Use a Star Generative Adversarial Network (StarGAN) to transform skin types (style), and enforce the feature representation to be invariant across different skin types.
  • https://github.com/arezou-pakzad/CIRCLe
  • Title: Semantic Segmentation with Generative Models: Semi-Supervised Learning and Strong Out-of-Domain Generalization
  • Publication: CVPR 2021
  • Summary: Propose a fully generative approach to semantic segmentation based on StyleGAN2, that models the joint image-label distribution and synthesizes both images and their semantic segmentation masks.
  • Code:https://github.com/nv-tlabs/semanticGAN_code
  • Title: Generative Adversarial Domain Generalization via Cross-Task Feature Attention Learning for Prostate Segmentation
  • Publication: ICONIP 2021
  • Summary: Propose a new Generative Adversarial Domain Generalization (GADG) network, which can achieve the domain generalization through the generative adversarial learning on multi-site prostate MRI images. Additionally, to make the prostate segmentation network learned from the source domains still have good performance in the target domain, a Cross-Task Attention Module (CTAM) is designed to transfer the main domain generalized features from the generation branch to the segmentation branch.
  • Title: Learning Domain-Agnostic Visual Representation for Computational Pathology Using Medically-Irrelevant Style Transfer Augmentation
  • Publication: TMI 2021
  • Summary: Propose a style transfer-based aug- mentation (STRAP) method for a tumor classification task, which applies style transfer from non-medical images to histopathology images.
  • Code:https://github.com/rikiyay/style-transfer-for-digital-pathology
  • Title: Multimodal Self-supervised Learning for Medical Image Analysis
  • Publication: IPMI 2021
  • Summary: Propose a novel approach leveraging self-supervised learning through multimodal jigsaw puzzles for cross-modal medical image synthesis tasks. Additionally, to increase the quantity of multimodal data, they design a cross-modal generation step to create synthetic images from one modality to another using the CycleGAN-based translation model.
  • Title: Random Style Transfer Based Domain Generalization Networks Integrating Shape and Spatial Information
  • Publication: STACOM 2020
  • Summary: Propose novel random style transfer based domain general- ization networks incorporating spatial and shape information based on GANs.
  • Feature Level Generalization

    Invariant Feature Representation

    For medical image analysis, a well-generalized model focuses more on task-related semantic features while disregarding task-unrelated style features. In this regard, three types of methods have been extensively investigated: feature normalization, explicit feature alignment, and domain adversarial learning.

    Feature normalization

    This line of methods aim to enhance the generalization ability of models by centering, scaling, decorrelating, standardizing, and whitening extracted feature distributions. This process aids in accelerating the convergence of algorithms and prevents features with larger scales from overpowering those with smaller ones. Common techniques include traditional scaling methods like min-max and z-score normalization, as well as deep learning methods such as batch, layer, and instance normalization.

    Diagram Descriptions
  • Title: SAN-Net: Learning generalization to unseen sites for stroke lesion segmentation with self-adaptive normalization
  • Publication: CBM 2023
  • Summary: Devise a masked adaptive instance normalization to minimize inter-sites discrepancies for cross-sites stroke lesion segmentation, which standardized input images from different sites into a domain-unrelated style by dynamically learning affine parameters.
  • Code:https://github.com/wyyu0831/SAN
  • Title: SS-Norm: Spectral-spatial normalization for single-domain generalization with application to retinal vessel segmentation
  • Publication: IET IP 2023
  • Summary: Decompose the feature into multiple frequency components by performing discrete cosine transform and analyze the semantic contribution degree of each component. Then reweight the frequency components of features and therefore normalize the distribution in the spectral domain.
  • Title: Generalizable Cross-modality Medical Image Segmentation via Style Augmentation and Dual Normalization
  • Publication: CVPR 2022
  • Summary: Design a dual-normalization module to estimate domain distribution information. During the test stage, the model select the nearest feature statistics according to style-embeddings in the dual-normalization module to normalize target domain features for generalization.
  • Code:https://github.com/zzzqzhou/Dual-Normalization
  • Explicit feature alignment

    Explicit feature alignment methods attempt to remove domain shifts by reducing the discrepancies in feature distributions across multiple source domains, thereby facilitating the learning of domain-invariant feature representations.

    Diagram Descriptions
  • Title: Domain Generalization for Medical Imaging Classification with Linear-Dependency Regularization
  • Publication: NeurIPS 2020
  • Summary: Adopt Kullback-Leibler (KL) divergence to align the distributions of latent features extracted from multiple source domains with a predefined prior distribution.
  • Code:https://github.com/wyf0912/LDDG
  • Title: Measuring Domain Shift for Deep Learning in Histopathology
  • Publication: JBHI 2020
  • Summary: Design a dual-normalization module to estimate domain distribution information. During the test stage, the model select the nearest feature statistics according to style-embeddings in the dual-normalization module to normalize target domain features for generalization.
  • Code:https://github.com/zzzqzhou/Dual-Normalization
  • Domain adversarial learning

    Domain-adversarial training methods are widely used for learning domain-invariant representations by introducing a domain discriminator in an adversarial relationship with the feature extractor

    Diagram Descriptions
  • Title: Adversarially-Regularized Mixed Effects Deep Learning (ARMED) Models Improve Interpretability, Performance, and Generalization on Clustered (non-iid) Data
  • Publication: IEEE TPAMI 2023
  • Summary: Propose a general-purpose framework for Adversarially-Regularized Mixed Effects Deep learning (ARMED). The ARMED employ an adversarial classifier to regularize the model to learn cluster-invariant fixed effects (domain invariant). The classifier attempts to predict the cluster membership based on the learned features, while the feature extractor is penalized for enabling this prediction.
  • Title: Localized adversarial domain generalization
  • Publication: CVPR 2022
  • Summary: Propose a general-purpose framework for Adversarially-Regularized Mixed Effects Deep learning (ARMED). The ARMED employ an adversarial classifier to regularize the model to learn cluster-invariant fixed effects (domain invariant). The classifier attempts to predict the cluster membership based on the learned features, while the feature extractor is penalized for enabling this prediction.
  • Code:https://github.com/zwvews/LADG
  • Feature disentanglement

    Feature disentanglement methods aim to decompose the features of input samples into domain-invariant (task-unrelated) and domain-specific (task-related) components, i.e., $\mathbf{z} = [\mathbf{z}\text{invariant}, \mathbf{z}\text{specific}] \in \mathcal{Z}$. The objective of robust generalization models is to concentrate exclusively on the task-related feature components $\mathbf{z}\text{invariant}$ while disregarding the task-unrelated ones $\mathbf{z}\text{specific}$. The mainstream methods of feature disentanglement mainly include multi-component learning and generative modeling.

    Multi-component learning

    Multi-component learning achieves feature disentanglement by designing different components to separately extract domain-invariant features and domain-specific features, thereby achieving feature decoupling.

    Diagram Descriptions
  • Title: MI-SegNet: Mutual Information-Based US Segmentation for Unseen Domain Generalization
  • Publication: MICCAI 2023
  • Summary: Propose MI-SegNet for ultrasound image segmentation. MI-SegNet employs two encoders that separately extract anatomical and domain features from images, and Mutual Information Neural Estimation (MINE) approximation is used to minimize the mutual information between these features.
  • Title: Towards principled disentanglement for domain generalization
  • Publication: CVPR 2022
  • Summary: Introduce disentanglement-constrained domain generalization (DDG) for cross-center tumor detection, which simultaneously learns a semantic encoder and a variation encoder for feature disentanglement, and further constrains the learned representations to be invariant to inter-class variation.
  • Title: Contrastive Domain Disentanglement for Generalizable Medical Image Segmentation
  • Publication: Arxiv 2022
  • Summary: Propose Contrastive Domain Disentanglement and Style Augmentation (CDDSA) for image segmentation in the fundus and MR images. This method introduce a disentangle network to decompose medical images into an anatomical representation and a modality representation, and a style contrastive loss function is designed to ensures that style representations from the same domain bear similarity while those from different domains diverge significantly.
  • Generative Learning

    Generative models are also effective techniques for traditional feature disentanglement, such as InfoGAN and $\beta$-VAE. For domain generalization, generative learning based disentanglement methods attempt to elucidate the sample generation mechanisms from the perspectives of domain, sample, and label, thereby achieving feature decomposition.

    Diagram Descriptions
  • Title: Learning domain-agnostic representation for disease diagnosiss
  • Publication: ICLR 2023
  • Summary: Leverage structural causal modeling to explicitly model disease-related and center-effects. Guided by this, propose a novel Domain Agnostic Representation Model (DarMo) based on variational Auto-Encoder and design domain-agnostic and domain-aware encoders to respectively capture disease-related features and varied center effects by incorporating a domain-aware batch normalization layer.
  • Title: DiMix: Disentangle-and-Mix Based Domain Generalizable Medical Image Segmentation
  • Publication: MICCAI 2023
  • Summary: Combine vision transformer architectures with style-based generators for cross-site medical segmentation. It learned domain-invariant representations by swapping domain-specific features, facilitating the disentanglement of content and styles.
  • Title: DIVA: Domain Invariant Variational Autoencoders
  • Publication: PLMR 2022
  • Summary: Propose Domain-invariant variational autoencoder (DIVA) for malaria cell image classification, which disentangles the features into domain information, category information, and other information, which is learned in the VAE framework.
  • Code: https://github.com/AMLab-Amsterdam/DIVA
  • Title: Variational Disentanglement for Domain Generalization
  • Publication: TMLR 2022
  • Summary: Propose a Variational Disentanglement Network (VDN) to classify breast cancer metastases. VDN disentangles domain-invariant and domain-specific features by estimating the information gain and maximizing the posterior probability.
  • Model Training Level

    Learning Strategy

    Learning strategies have gained significant attention in tackling domain generalization challenges across various fields. They leverage generic learning paradigms to improve model generalization performance, which can be mainly categorized into three categories: ensemble learning, meta-learning, and self-supervised learning.

    Ensemble Learning

    Ensemble learning is a machine learning technique where multiple models are trained to solve the same problem. For domain generalization, different models can capture domain-specific patterns and representations, so their combination could lead to more robust predictions.

    Diagram Descriptions
  • Title: Mixture of calibrated networks for domain generalization in brain tumor segmentation Data
  • Publication: KBS 2023
  • Summary: Design the mixture of calibrated networks (MCN) for cross-domain brain tumor segmentation, which combines the predictions from multiple models, and each model has unique calibration characteristics to generate diverse and fine-grained segmentation map.
  • Title: DeepLesionBrain: Towards a broader deep-learning generalization for multiple sclerosis lesion segmentation
  • Publication: MedIA 2021
  • Summary: Use a large group of compact 3D CNNs spatially distributed over the brain regions and associate a distinct network with each region of the brain, thereby producing consensus-based segmentation robust to domain shift.
  • Title: MS-Net: Multi-Site Network for Improving Prostate Segmentation With Heterogeneous MRI Data
  • Publication: IEEE TMI 2020
  • Summary: Propose multi-site network (MS-Net) for cross-site prostate segmentation, which consists of a universal network and multiple domain-specific auxiliary branches. The universal network is trained with the supervision of ground truth and transferred multi-site knowledge from auxiliary branches to help explore the general representation.
  • Code: https://github.com/liuquande/MS-Net
  • Meta Learning

    Meta-learning, also known as learning to learn, is a machine learning method focused on designing algorithms that can generalize knowledge from diverse tasks. In medical domain generalization tasks, it plays a significant role in addressing the challenge of expensive data collecting and annotating, which divide the source domain(s) into meta-train and meta-test sets to simulate domain shift.

    Diagram Descriptions
  • Title: FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space
  • Publication: CVPR 2021
  • Summary: Introduce episodic meta-learning for federated medical image segmentation. During the training process of local models, the raw input serves as the meta-train data, while its counterparts generated from frequency space are used as the meta-test data, helping in learning generalizable model parameters.
  • Code: https://github.com/liuquande/FedDG-ELCFS
  • Title: Shape-Aware Meta-learning for Generalizing Prostate MRI Segmentation to Unseen Domains
  • Publication: MICCAI 2020
  • Summary: Propose a shape-aware meta-learning (SAML) scheme for the prostate MRI segmentation, rooted in gradient-based meta-learning. It explicitly simulates domain shift during training by dividing virtual meta-train and meta-test sets.
  • Code: https://github.com/liuquande/SAML
  • Self-supervised Learning

    Self-supervised learning is a machine learning method where a model learns general representations from input data without explicit supervision. These representations enhance the model's generalization capability, enabling it to mitigate domain-specific biases. This approach is particularly valuable in scenarios where labeled data is scarce or costly to obtain and annotate, such as in medical imaging.

    Diagram Descriptions
  • Title: Robust and data-efficient generalization of self-supervised machine learning for diagnostic imaging
  • Publication: Nature Biomedical Engineering 2023
  • Summary: Propose robust and efficient medical imaging with self-supervision (REMEDIS) for technology, demographic and behavioral domain shifts, which combines large-scale supervised transfer learning on natural images and intermediate contrastive self-supervised learning on medical images and requires minimal task-specific customization.
  • Title: Frequency-Mixed Single-Source Domain Generalization for Medical Image Segmentation
  • Publication: MICCAI 2023
  • Summary: Leverage frequency-based augmentation technique to extend the single-source domain discrepancy and constructed self-supervision in the single domain augmentation to learn robust context-aware representations for the fundus vessels segmentation.
  • Code: https://github.com/liamheng/Non-IID_Medical_Image_Segmentation
  • Optimization Strategy

    Optimization strategies play a crucial role in minimizing overfitting to specific domains, which is achieved by adjusting hyperparameters, selecting appropriate loss functions, regularization techniques, and optimization algorithms.

    Diagram Descriptions
  • Title: Model-Based Domain Generalization
  • Publication: NeurIPS 2021
  • Summary: Present a model-based domain generalization framework to rigorously reformulate the domain generalization problem as a semi-infinite constrained optimization problem. employed group distributionally robust optimization (GDRO) for the skin lesion classification model. This optimization involves more aggressive regularization, implemented through a hyperparameter to favor fitting smaller groups, and early stopping techniques to enhance generalization performance.
  • Code: https://github.com/arobey1/mbdg
  • Title: DOMINO++: Domain-Aware Loss Regularization for Deep Learning Generalizability
  • Publication: MICCAI 2023
  • Summary: Introduce an adaptable regularization framework to calibrate intracranial MRI segmentation models based on expert-guided and data-guided knowledge. The strengths of this regularization lie in its ability to take advantage of the benefits of both the semantic confusability derived from domain knowledge and data distribution.
  • Model Test Level

    Test-time Adaptation

    Diagram Descriptions
  • Title: UniAda: Domain Unifying and Adapting Network for Generalizable Medical Image Segmentation
  • Setting: MDG
  • Publication: IEEE TMI 2024
  • Summary: Propose a domain Unifying and Adapting network (UniAda) with DFU and UTTA module for generalizable medical image segmentation, a novel "unifying while training, adapting while testing" paradigm that can learn a domain-aware base model during training and dynamically adapt it to unseen target domains during testing. The DFU module unifies multi-source domains into a global inter-source domain through a novel feature statistics update mechanism, capable of sampling new features for previously unseen domains, thus enhancing the training of a domain-aware base model. The UTTA module leverages an uncertainty map to guide the adaptation of the trained model for each testing sample, considering the possibility that the specific target domain may fall outside the global inter-source domain.
  • Code: https://github.com/ZhouZhang233/UniAda
  • Title: Single-Domain Generalization in Medical Image Segmentation via Test-Time Adaptation from Shape Dictionary
  • Setting: SDG
  • Publication: AAAI 2022
  • Summary: Present a SDG approach that extracts and integrates the semantic shape prior information of segmentation that are invariant across domains and can be well-captured even from single domain data to facilitate segmentation under distribution shifts. Besides, a test-time adaptation strategy with dual-consistency regularization is further devised to promote dynamic incorporation of these shape priors under each unseen domain to improve model generalizability.
  • Papers on Universal Foundation Model

    Medical image segmentation tasks encompass diverse imaging modalities, such as magnetic resonance imaging (MRI), X-ray, computed tomography (CT), and microscopy; various biomedical domains, including the abdomen, chest, brain, retina, and individual cells; and multiple label types within a region, such as heart valves or chambers. Traditional task-specific models are designed to train and test on a single, specific dataset. In contrast, universal foundation models aim to learn a single, generalizable medical image segmentation model capable of performing well across a wide range of tasks, including those significantly different from those encountered during training, without requiring retraining.

    Survey

    Diagram Descriptions
  • Title: Foundational models in medical imaging: A comprehensive survey and future vision
  • Publication: Arxiv 2023
  • Summary: This survey provides an in-depth review of recent advancements in foundational models for medical imaging. It categorizes these models into four main groups, distinguishing between those prompted by text and those guided by visual cues. Each category presents unique strengths and capabilities, which are further explored through exemplary works and comprehensive methodological descriptions. Furthermore, this survey evaluates the advantages and limitations inherent to each model type, highlighting their areas of excellence while identifying aspects requiring improvement.
  • Repo: https://github.com/xmindflow/Awesome-Foundation-Models-in-Medical-Imaging
  • Visual Foundation Models

    Interactive

    Interactive segmentation paradigm means the foundation model segments the target following the user-given prompts, such as a point, a bounding box (BBox), doodles or free text-like descriptions.

    Diagram Descriptions
  • Title: Segment anything in medical images
  • Publication: Nature Communications 2024
  • Summary: Present MedSAM, a foundation model designed for bridging this gap by enabling universal medical image segmentation. The model is developed on a large-scale medical image dataset with 1,570,263 image-mask pairs, covering 10 imaging modalities and over 30 cancer types.
  • Code: https://github.com/bowang-lab/MedSAM
  • Title: ScribblePrompt: Fast and Flexible Interactive Segmentation for Any Biomedical Image
  • Publication: ECCV 2024
  • Summary: Present ScribblePrompt, a flexible neural network based interactive segmentation tool for biomedical imaging that enables human annotators to segment previously unseen structures using scribbles, clicks, and bounding box. ScribblePrompt’s success rests on a set of careful design decisions. These include a training strategy that incorporates both a highly diverse set of images and tasks, novel algorithms for simulated user interactions and labels, and a network that enables fast inference.
  • Code: https://scribbleprompt.csail.mit.edu
  • Title: Clustering Propagation for Universal Medical Image Segmentation
  • Publication: CVPR 2024
  • Summary: Introduce S2VNet a universal framework that leverages Slice-to-Volume propagation to unify automatic/interactive segmentation within a single model and one training session. S2VNet makes full use of the slice-wise structure of volumetric data by initializing cluster centers from the cluster results of the previous slice. This enables knowledge acquired from prior slices to assist in segmenting the current slice further efficiently bridging the communication between remote slices using mere 2D networks. Moreover, such a framework readily accommodates interactive segmentation with no architectural change simply by initializing centroids from user inputs.
  • Code: https://github.com/dyh127/S2VNet
  • Title: TP-DRSeg: Improving Diabetic Retinopathy Lesion Segmentation with Explicit Text-Prompts Assisted SAM
  • Publication: MICCAI 2024
  • Summary: Propose a framework that customizes SAM for text-prompted Diabetic Retinopathy lesion segmentation, termed TP-DRSeg, which exploits language cues to inject medical prior knowledge into the vision-only segmentation network, thereby combining the advantages of different foundation models and enhancing the credibility of segmentation. To unleash the potential of vision-language models in the recognition of medical concepts, it utlizes an explicit prior encoder that transfers implicit medical concepts into explicit prior knowledge, providing explainable clues to excavate low-level features associated with lesions. Furthermore, a prior-aligned injector is designed to inject explicit priors into the segmentation process, which can facilitate knowledge sharing across multi-modality features and allow the framework to be trained in a parameter-efficient fashion.
  • Code: https://github.com/wxliii/TP-DRSeg
  • Title: MedCLIP-SAM: Bridging Text and Image Towards Universal Medical Image Segmentation
  • Publication: MICCAI 2024
  • Summary: Propose MedCLIP-SAM that combines CLIP and SAM models to generate segmentation of clinical scans using text prompts in both zero-shot and weakly supervised settings. To achieve this, it employs a new Decoupled Hard Negative Noise Contrastive Estimation (DHN-NCE) loss to fine-tune the BiomedCLIP model and the recent gScoreCAM to generate prompts to obtain segmentation masks from SAM in a zero-shot setting. Additionally, it explores the use of zero-shot segmentation labels in a weakly supervised paradigm to improve the segmentation quality further.
  • Code: https://github.com/HealthX-Lab/MedCLIP-SAM
  • Title: DB-SAM: Delving into High Quality Universal Medical Image Segmentation
  • Publication: MICCAI 2024
  • Summary: Propose a dual-branch adapted SAM framework, which contains two branches in parallel: a ViT branch and a convolution branch. The ViT branch incorporates a learnable channel attention block after each frozen attention block, which captures domain-specific local features. On the other hand, the convolution branch employs a lightweight convolutional block to extract domain-specific shallow features from the input medical image. To perform cross-branch feature fusion, a bilateral cross-attention block and a ViT convolution fusion block are designed to dynamically combine diverse information of two branches for mask decoder.
  • Code: https://github.com/AlfredQin/DB-SAM
  • Title: DB-SAM: Delving into High Quality Universal Medical Image Segmentation
  • Publication: MICCAI 2024
  • Summary: Propose a dual-branch adapted SAM framework, which contains two branches in parallel: a ViT branch and a convolution branch. The ViT branch incorporates a learnable channel attention block after each frozen attention block, which captures domain-specific local features. On the other hand, the convolution branch employs a lightweight convolutional block to extract domain-specific shallow features from the input medical image. To perform cross-branch feature fusion, a bilateral cross-attention block and a ViT convolution fusion block are designed to dynamically combine diverse information of two branches for mask decoder.
  • Code: https://github.com/AlfredQin/DB-SAM
  • Few-shot/One-shot

    In few-shot/one-shot setting, a pre-trained foundationa model needs one or few labeled samples as the ’supportive examples’, to grasp a new specific task.

    Diagram Descriptions
  • Title: One-Prompt to Segment All Medical Images
  • Publication: CVPR 2024
  • Introduce a new paradigm toward the universal medical image segmentation termed One-Prompt Segmentation which combines the strengths of one-shot and interactive methods. In the inference stage with just one prompted sample it can adeptly handle the unseen task in a single forward pass.
  • Code: https://github.com/KidsWithTokens/one-prompt
  • Title: MedUniSeg: 2D and 3D Medical Image Segmentation via a Prompt-driven Universal Model
  • Publication: CVPR 2024
  • Introduce MedUniSeg, a prompt-driven universal segmentation model designed for 2D and 3D multi-task segmentation across diverse modalities and domains. MedUniSeg employs multiple modal-specific prompts alongside a universal task prompt to accurately characterize the modalities and tasks. To generate the related priors, a modal map (MMap) and the fusion and selection (FUSE) modules are designed, which transform modal and task prompts into corresponding priors. These modal and task priors are systematically introduced at the start and end of the encoding process.
  • Code: https://github.com/yeerwen/UniSeg
  • Title: ESP-MedSAM: Efficient Self-Prompting SAM for Universal Domain-Generalized Medical Image Segmentation
  • Publication: Arxiv 2024
  • Propose an efficient self-prompting SAM for universal domain-generalized medical image segmentation, named ESP-MedSAM. A multi-modal Decoupled Knowledge Distillation (MMDKD) strategy is first designed to construct a lightweight semi-parameter sharing image encoder that produces discriminative visual features for diverse modalities. Further, it introduces the Self-Patch Prompt Generator (SPPG) to generate high-quality dense prompt embeddings for guiding segmentation decoding automatically. Finally, it designed the Query-Decoupled Modality Decoder (QDMD) that leverages a one-to-one strategy to provide an independent decoding channel for every modality.
  • Code: https://github.com/xq141839/ESP-MedSAM
  • Title: UniverSeg: Universal Medical Image Segmentation
  • Publication: ICCV 2023
  • Summary: Present UniverSeg, a universal segmentation method for solving unseen medical segmentation tasks without additional training. Given a query image and an example set of image-label pairs that define a new segmentation task, UniverSeg employs a new CrossBlock mechanism to produce accurate segmentation maps without additional training. What's more, 53 open-access medical segmentation datasets with over 22,000 scans were collected to train UniverSeg on a diverse set of anatomies and imaging modalities.
  • Code: https://universeg.csail.mit.edu
  • Multimodal Foundation Models

    Contrastive

    Contrastive textually prompted models are increasingly recognized as foundational models for medical imaging. They learn representations that capture the semantics and relationships between medical images and their corresponding textual prompts. By leveraging contrastive learning objectives, these models bring similar image-text pairs closer in the feature space while pushing dissimilar pairs apart.

    Diagram Descriptions
  • Title: MM-Retinal: Knowledge-Enhanced Foundational Pretraining with Fundus Image-Text Expertise
  • Publication: MICCAI 2024
  • Propose MM-Retinal, a multi-modal dataset that encompasses high-quality image-text pairs collected from professional fundus diagram books. Moreover, enabled by MM-Retinal, present a novel Knowledge-enhanced foundational pretraining model which incorporates Fundus Image-Text expertise. It is designed with image similarity-guided text revision and mixed training strategy to infuse expert knowledge.
  • Code: https://github.com/lxirich/MM-Retinal
  • Title: Knowledge-enhanced Visual-Language Pre-training on Chest Radiology Images
  • Publication: Nature Communications 2023
  • Propose an approach called Knowledge-enhanced Auto Diagnosis (KAD) which leverages existing medical domain knowledge to guide vision-language pre-training using paired chest X-rays and radiology reports.
  • Generative

    Generative models represent another category within textually prompted models for medical imaging. These models are designed to generate realistic medical images based on textual prompts or descriptions. They utilize techniques such as variational autoencoders (VAEs) and generative adversarial networks (GANs) to learn the underlying distribution of medical images, enabling the creation of new samples that align with the provided prompts.

    Diagram Descriptions
  • Title: Med-Flamingo: a Multimodal Medical Few-shot Learner
  • Publication: Arxiv 2023
  • Summary: Propose Med-Flamingo, a multimodal few-shot learner adapted to the medical domain. Based on OpenFlamingo-9B, it is pre-trained on paired and interleaved medical image-text data from publications and textbooks. Med-Flamingo unlocks few-shot generative medical visual question answering (VQA) abilities.
  • Repo: https://github.com/snap-stanford/med-flamingo
  • Conversational

    Conversational textually prompted models are designed to enable interactive dialogues between medical professionals and the model by fine-tuning foundational models on specific instruction sets. These models enhance communication and collaboration, allowing medical experts to ask questions, provide instructions, and seek explanations related to medical images.

    Diagram Descriptions
  • Title: Uni-Med: A Unified Medical Generalist Foundation Model For Multi-Task Learning Via Connector-MoE
  • Publication: Neurips 2024
  • Propose Uni-Med, a novel medical generalist foundation model consisting of a universal visual feature extraction module, a connector mixture-of-experts (CMoE) module, and a large language model (LLM). Benefiting from the proposed CMoE, which leverages a well-designed router with a mixture of projection experts at the connector, Uni-Med provides an efficient solution to the tug-of-war problem. It is capable of performing six different medical tasks, including question answering, visual question answering, report generation, referring expression comprehension, referring expression generation, and image classification.
  • Code: https://github.com/MSIIP/Uni-Med
  • Datasets

    We list the widely used benchmark datasets for domain generalization including classification and segmentation.

    Dataset Task #Domain #Class Description
    Fundus OC/OD Segmentation 4 2 Retinal fundus RGB images from three public datasets, including REFUGE, DrishtiGSand RIM-ONE-r
    Prostate MRI Segmentation 6 1 T2-weighted MRI data collected three public datasets, including NCI-ISBI13, I2CVB and PROMISE12
    Abdominal CT & MRI Segmentation 2 4 30 volumes Computed tomography (CT) and 20 volumes T2 spectral presaturation with inversion recovery (SPIR) MRI
    Cardiac Segmentation 2 3 45 volumes balanced steady-state free precession (bSSFP) MRI and late gadolinium enhanced (LGE) MRI
    BraTS Segmentation 4 1 Multi-contrast MR scans from glioma patients and consists of four different contrasts: T1, T1ce, T2, and FLAIR
    M&Ms Segmentation 4 3 Multi-centre, multi-vendor and multi-disease cardiac image segmentation dataset contains 320 subjects
    SCGM Segmentation 4 1 Single channel spinal cord gray matter MRI from four different centers
    Camelyon17 Detection & Classification 5 2 Whole-slide images (WSI) of hematoxylin and eosin (H&E) stained lymph node sections of 100 patients
    Chest X-rays Classification 3 2 Chest X-rays for detecting whether the image corresponds to a patient with Pneumonia from three dataset NIH, ChexPert and RSNA

    Libraries

    We list the libraries of domain generalization.

    Other Resources

    • A collection of domain generalization papers organized by amber0309.
    • A collection of domain generalization papers organized by jindongwang.
    • A collection of papers on domain generalization, domain adaptation, causality, robustness, prompt, optimization, generative model, etc, organized by yfzhang114.
    • A collection of awesome things about domain generalization organized by junkunyuan.

    Contact

    • If you would like to add/update the latest publications / datasets / libraries, please directly add them to this README.md.
    • If you would like to correct mistakes/provide advice, please contact us by email (nzw@zju.edu.cn).
    • You are welcomed to update anything helpful.

    Acknowledgements

    Contributors

    Ziwei-Niu
    ZiweiNiu
    zerone-fg
    Zerone-fg

    About

    No description, website, or topics provided.

    Resources

    Stars

    Watchers

    Forks

    Releases

    No releases published

    Packages

     
     
     

    Contributors

    Languages