Domain Generalization and Foundation Model in Medical Image Analysis

🔥 This is a repository for organizing papers ,codes, and etc related to Domain Generalization and Foundation model in Medical Image Analysis (DG&FM in MedIA).

💗 Medical Image Analysis (MedIA) plays a critical role in computer aided diagnosis system, enabling accurate diagnosis and assessment for various diseases. Over the last decade, deep learning (DL) has demonstrated great success in automating various MedIA tasks such as disease diagnosis, lesion segmentation, prognosis prediction, etc. Despite their success, in many real-world healthcare scenarios, the difference in the image acquisition, such as device manufacturer, scanning protocol, image sequence, and modality, introduces domain shifts, resulting in a significant decline in performance when deploying the well-trained model to clinical sites with different data distributions. Therefore, enhancing the generalization ability of DL models in MedIA is crucial in both clinical and academic fields. Domain generalization (DG), as an effective method to improve the generalization performance of task-specific models, can effectively mitigate the performance degradation caused by domain shifts in medical images, such as cross-center, cross-sequence, and cross-modality variations. Recently, with the explosive growth of data and advancements in computational resources, Foundation Model (FM) has addressed the domain shift issue in a more direct manner by collecting a vast amount of diverse data, effectively preventing domain shifts at the source. It can handle a wide variety of tasks, including entirely new tasks that it has never encountered before. However, compared to task-specific DG models, FM offers increased task diversity and flexibility. Nonetheless, challenges such as medical data privacy concerns, data-sharing restrictions, the need for manual annotations by medical experts, and deployment demands persist. Therefore, we maintain that both DG and FM have their own merits and continue to hold significant research value.

🎯 We hope that this repository can provide assistance to researchers and practitioners in medical image analysis, domain generalization and foundation models.

🚀 New Updates:

06/01/2025 : We have modified the presentation format to a table, which makes it easier for readers to review.
25/12/2024 : We have added a Universal Segmentation Foundational Model branch.
08/02/2024 : We released this repo for organizing papers ,codes, and etc related to domain generalization for medical image analysis.

Papers on Domian Generalization (ongoing)

Data Manipulation Level

Data Augmentation

Augmentation is widely employed in vision tasks to mitigate overfitting and improve generalization capacity, including operations like flipping, cropping, color jittering, noise addition, and others. For domain generalization in medical image analysis, augmentation methods can be broadly categorized as randomization-based, adversarial-based, and normalization-based.

Normalization-based

Normalization-based methods aims to normalize the raw intensity values or statistics to reduce the impact of variations in image intensity across different domains. Specifically, these methods are usually employed for specific tasks, such as pathological images testtest.

Diagram	Descriptions
	Title: Generative models for color normalization in digital pathology and dermatology: Advancing the learning paradigm Publication: Expert Systems with Applications 2024 Summary: Formulate the color normalization task as an image-to-image translation problem, ensuring a pixel-to-pixel correspondence between the original and normalized images.
	Title: Improved Domain Generalization for Cell Detection in Histopathology Images via Test-Time Stain Augmentation Publication: MICCAI 2022 Summary: Propose a test-time stain normalization method for cell detection in histopathology images, which transforms the test images by mixing their stain color with that of the source domain, so that the mixed images may better resemble the source images or their stain-transformed versions used for training.
	Title: Tackling Mitosis Domain Generalization in Histopathology Images with Color Normalization Publication: MICCAI Challenge 2022 Summary: Employ a color normalization method in their architecture for mitosis detection in histopathology images.
	Title: Improve Unseen Domain Generalization via Enhanced Local Color Transformation Publication: MICCAI 2020 Summary: Propose Enhanced Domain Transformation (EDT) for diabetic retinopathy classification, which aims to project the images into a color space that aligns the distribution of source data and unseen target data.

Randomization-based

The goal of randomization-based methods is to generate novel input data by applying random transformations to the image-space, frequency-space and feature space.

Image-space

Diagram	Descriptions
	Title: Rethinking Data Augmentation for Single-Source Domain Generalization in Medical Image Segmentation Publication: AAAI 2023 Summary: Rethink the data augmentation strategy for DG in medical image segmentation and propose a location-scale augmentation strategy, which performs constrained Bezier transformation on both global and local (i.e. class-level) regions to enrich the informativeness and diversity of augmented. Code: https://github.com/Kaiseem/SLAug
	Title: Generalizable Cross-modality Medical Image Segmentation via Style Augmentation and Dual Normalization Publication: CVPR 2022 Summary: Employ Bezier Curves to augment single source domain into different styles and split them into source-similar domain and source-dissimilar domain. Code: https://github.com/zzzqzhou/Dual-Normalization
	Title: Generalizing Deep Learning for Medical Image Segmentation to Unseen Domains via Deep Stacked Transformation Publication: IEEE TMI 2020 Summary: Propose a deep stacked transformation approach by applying extensive random typical transformations on a single source domain to simulate the domain shift.

Frequency-space

Diagram	Descriptions
	Title: Frequency-Mixed Single-Source Domain Generalization for Medical Image Segmentation Publication: MICCAI 2023 Summary: Present FMAug that extends the domain margin by mixing patches from diverse frequency views. Code: https://github.com/liamheng/Non-IID_Medical_Image_Segmentation
	Title: Fourier-based augmentation with applications to domain generalization Publication: Pattern Recognition 2023 Summary: Propose a Fourier-based data augmentation strategy called AmpMix by linearly interpolating the amplitudes of two images while keeping their phases unchanged to simulated domain shift. Additionally a consistency training between different augmentation views is incorporated to learn invariant representation. Code: https://github.com/MediaBrain-SJTU/FACT
	Title: Generalizable Medical Image Segmentation via Random Amplitude Mixup and Domain-Specific Image Restoration Publication: ECCV 2022 Summary: Present a continuous frequency space interpolation mechanism for cross-site fundus and prostate segmentation, which exchanges amplitude spectrum (style) to generate new samples while keeping the phase spectrum (semantic) Code: https://github.com/zzzqzhou/RAM-DSIR
	Title: Domain Generalization in Restoration of Cataract Fundus Images Via High-Frequency Components Publication: ISBI 2022 Summary: Cataract-like fundus images are randomly synthesized from an identical clear image by adding cataractous blurry. Then, high-frequency components are extracted from the cataract-like images to reduce the domain shift and achieve domain alignment. Code: https://github.com/HeverLaw/Restoration-of-Cataract-Images-via-Domain-Generalization
	Title: FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space Publication: CVPR 2021 Summary: Propose a continuous frequency space interpolation mechanism for federated medical domain generalization, which exchanges amplitude spectrum across clients to transmit the distribution information, while keeping the phase spectrum with core semantics locally for privacy protection. Code: https://github.com/liuquande/FedDG-ELCFS

Feature-space

Diagram	Descriptions
	Title: Improving the Generalizability of Convolutional Neural Network-Based Segmentation on CMR Images Publication: Frontiers in Cardiovascular Medicine 2020 Summary: Propose a simple yet effective way for improving the network generalization ability by carefully designing data normalization and augmentation strategies.

Adversarial-based

Adversarial-based data augmentation methods are driven by adversarial training, aiming to maximize the diversity of data while simultaneously constraining its reliability.

Diagram	Descriptions
	Title: AADG: Automatic Augmentation for Domain Generalization on Retinal Image Segmentation Publication: TMI 2022 Summary: Introduce a novel proxy task maximizing the diversity among multiple augmented novel domains as measured by the Sinkhorn distance in a unit sphere space to achieve automated augmentation. Adversarial training and deep reinforcement learning are employed to efficiently search the objectives. Code: https://github.com/CRazorback/AADG
	Title: Adversarial Consistency for Single Domain Generalization in Medical Image Segmentation Publication: MICCAI 2022 Summary: Synthesize the new domains via learning an adversarial domain synthesizer (ADS), and propose to keep the underlying semantic information between the source image and the synthetic image via a mutual information regularizer.
	Title: MaxStyle: Adversarial Style Composition for Robust Medical Image Segmentation Publication: MICCAI 2022 Summary: Propose a data augmentation framework called MaxStyle, which augments data with improved image style diversity and hardness, by expanding the style space with noise and searching for the worst-case style composition of latent features via adversarial training. Code:https://github.com/cherise215/MaxStyle
	Title: Domain Generalization with Adversarial Intensity Attack for Medical Image Segmentation Publication: Arxiv 2023 Summary: Propose Adversarial Intensity Attack (AdverIN) that introduce an adversarial attack on the data intensity distribution, which leverages adversarial training to generate training data with an infinite number of styles and increase data diversity while preserving essential content information.
	Title: TeSLA: Test-Time Self-Learning With Automatic Adversarial Augmentation Publication: CVPR 2023 Summary: Propose a method that combines knowledge distillation with adversarial-based data augmentation for cross-site medical image segmentation tasks. Code:https://github.com/devavratTomar/TeSLA

Data Generation

Data generation is devoted to utilizing generative models such as Variational Autoencoder (VAE), Generative Adversarial Networks (GANs), Diffusion Models and etc., to generate fictional and novel samples. With source domain data becoming more complex, diverse, and informative, the generalization ability can be increased.

Diagram	Descriptions
	Title: GH-DDM: the generalized hybrid denoising diffusion model for medical image generation Publication: Multimedia Systems 2023 Summary: Introduce a generalized hybrid denoising diffusion model to enhance generalization ability by generating new cross-domain medical images, which leverages the strong abilities of transformers into diffusion models to model long-range interactions and spatial relationships between anatomical structures.
	Title: Test-Time Image-to-Image Translation Ensembling Improves Out-of-Distribution Generalization in Histopathology Publication: MICCAI 2022 Summary: Utilize multi-domain image-to-image translation model StarGanV2 and projects histopathology test images from unseen domains to the source domains, classify the projected images and ensemble their predictions. Code:https://gitlab.com/vitadx/articles/test-time-i2i-translation-ensembling
	Title: Domain Generalization for Retinal Vessel Segmentation with Vector Field Transformer Publication: PMLR 2022 Summary: Apply auto-encoder to generate different styles of enhanced vessel maps for augmentation and uses Hessian matrices of an image for segmentation as vector fields better capture the morphological features and suffer less from covariate shift. Code:https://github.com/MedICL-VU/Vector-Field-Transformer
	Title: CIRCLe: Color Invariant Representation Learning for Unbiased Classification of Skin Lesions Publication: ECCV Workshop 2022 Summary: Use a Star Generative Adversarial Network (StarGAN) to transform skin types (style), and enforce the feature representation to be invariant across different skin types. https://github.com/arezou-pakzad/CIRCLe
	Title: Semantic Segmentation with Generative Models: Semi-Supervised Learning and Strong Out-of-Domain Generalization Publication: CVPR 2021 Summary: Propose a fully generative approach to semantic segmentation based on StyleGAN2, that models the joint image-label distribution and synthesizes both images and their semantic segmentation masks. Code:https://github.com/nv-tlabs/semanticGAN_code
	Title: Generative Adversarial Domain Generalization via Cross-Task Feature Attention Learning for Prostate Segmentation Publication: ICONIP 2021 Summary: Propose a new Generative Adversarial Domain Generalization (GADG) network, which can achieve the domain generalization through the generative adversarial learning on multi-site prostate MRI images. Additionally, to make the prostate segmentation network learned from the source domains still have good performance in the target domain, a Cross-Task Attention Module (CTAM) is designed to transfer the main domain generalized features from the generation branch to the segmentation branch.
	Title: Learning Domain-Agnostic Visual Representation for Computational Pathology Using Medically-Irrelevant Style Transfer Augmentation Publication: TMI 2021 Summary: Propose a style transfer-based aug- mentation (STRAP) method for a tumor classification task, which applies style transfer from non-medical images to histopathology images. Code:https://github.com/rikiyay/style-transfer-for-digital-pathology
	Title: Multimodal Self-supervised Learning for Medical Image Analysis Publication: IPMI 2021 Summary: Propose a novel approach leveraging self-supervised learning through multimodal jigsaw puzzles for cross-modal medical image synthesis tasks. Additionally, to increase the quantity of multimodal data, they design a cross-modal generation step to create synthetic images from one modality to another using the CycleGAN-based translation model.
	Title: Random Style Transfer Based Domain Generalization Networks Integrating Shape and Spatial Information Publication: STACOM 2020 Summary: Propose novel random style transfer based domain general- ization networks incorporating spatial and shape information based on GANs.

Feature Level Generalization

Invariant Feature Representation

For medical image analysis, a well-generalized model focuses more on task-related semantic features while disregarding task-unrelated style features. In this regard, three types of methods have been extensively investigated: feature normalization, explicit feature alignment, and domain adversarial learning.

Feature normalization

This line of methods aim to enhance the generalization ability of models by centering, scaling, decorrelating, standardizing, and whitening extracted feature distributions. This process aids in accelerating the convergence of algorithms and prevents features with larger scales from overpowering those with smaller ones. Common techniques include traditional scaling methods like min-max and z-score normalization, as well as deep learning methods such as batch, layer, and instance normalization.

Diagram	Descriptions
	Title: SAN-Net: Learning generalization to unseen sites for stroke lesion segmentation with self-adaptive normalization Publication: CBM 2023 Summary: Devise a masked adaptive instance normalization to minimize inter-sites discrepancies for cross-sites stroke lesion segmentation, which standardized input images from different sites into a domain-unrelated style by dynamically learning affine parameters. Code:https://github.com/wyyu0831/SAN
	Title: SS-Norm: Spectral-spatial normalization for single-domain generalization with application to retinal vessel segmentation Publication: IET IP 2023 Summary: Decompose the feature into multiple frequency components by performing discrete cosine transform and analyze the semantic contribution degree of each component. Then reweight the frequency components of features and therefore normalize the distribution in the spectral domain.
	Title: Generalizable Cross-modality Medical Image Segmentation via Style Augmentation and Dual Normalization Publication: CVPR 2022 Summary: Design a dual-normalization module to estimate domain distribution information. During the test stage, the model select the nearest feature statistics according to style-embeddings in the dual-normalization module to normalize target domain features for generalization. Code:https://github.com/zzzqzhou/Dual-Normalization

Explicit feature alignment

Explicit feature alignment methods attempt to remove domain shifts by reducing the discrepancies in feature distributions across multiple source domains, thereby facilitating the learning of domain-invariant feature representations.

Diagram

Descriptions

Title: Domain Generalization for Medical Imaging Classification with Linear-Dependency Regularization

Publication: NeurIPS 2020

Summary: Adopt Kullback-Leibler (KL) divergence to align the distributions of latent features extracted from multiple source domains with a predefined prior distribution.

Code:https://github.com/wyf0912/LDDG

Title: Measuring Domain Shift for Deep Learning in Histopathology

Publication: JBHI 2020

Summary: Design a dual-normalization module to estimate domain distribution information. During the test stage, the model select the nearest feature statistics according to style-embeddings in the dual-normalization module to normalize target domain features for generalization.

Code:https://github.com/zzzqzhou/Dual-Normalization

Domain adversarial learning

Domain-adversarial training methods are widely used for learning domain-invariant representations by introducing a domain discriminator in an adversarial relationship with the feature extractor

Diagram

Descriptions

Title: Adversarially-Regularized Mixed Effects Deep Learning (ARMED) Models Improve Interpretability, Performance, and Generalization on Clustered (non-iid) Data

Publication: IEEE TPAMI 2023

Summary: Propose a general-purpose framework for Adversarially-Regularized Mixed Effects Deep learning (ARMED). The ARMED employ an adversarial classifier to regularize the model to learn cluster-invariant fixed effects (domain invariant). The classifier attempts to predict the cluster membership based on the learned features, while the feature extractor is penalized for enabling this prediction.

Title: Localized adversarial domain generalization

Publication: CVPR 2022

Summary: Propose a general-purpose framework for Adversarially-Regularized Mixed Effects Deep learning (ARMED). The ARMED employ an adversarial classifier to regularize the model to learn cluster-invariant fixed effects (domain invariant). The classifier attempts to predict the cluster membership based on the learned features, while the feature extractor is penalized for enabling this prediction.

Code:https://github.com/zwvews/LADG

Feature disentanglement

Feature disentanglement methods aim to decompose the features of input samples into domain-invariant (task-unrelated) and domain-specific (task-related) components, i.e., $\mathbf{z} = [\mathbf{z}\text{invariant}, \mathbf{z}\text{specific}] \in \mathcal{Z}$. The objective of robust generalization models is to concentrate exclusively on the task-related feature components $\mathbf{z}\text{invariant}$ while disregarding the task-unrelated ones $\mathbf{z}\text{specific}$. The mainstream methods of feature disentanglement mainly include multi-component learning and generative modeling.

Multi-component learning

Multi-component learning achieves feature disentanglement by designing different components to separately extract domain-invariant features and domain-specific features, thereby achieving feature decoupling.

Diagram	Descriptions
	Title: MI-SegNet: Mutual Information-Based US Segmentation for Unseen Domain Generalization Publication: MICCAI 2023 Summary: Propose MI-SegNet for ultrasound image segmentation. MI-SegNet employs two encoders that separately extract anatomical and domain features from images, and Mutual Information Neural Estimation (MINE) approximation is used to minimize the mutual information between these features.
	Title: Towards principled disentanglement for domain generalization Publication: CVPR 2022 Summary: Introduce disentanglement-constrained domain generalization (DDG) for cross-center tumor detection, which simultaneously learns a semantic encoder and a variation encoder for feature disentanglement, and further constrains the learned representations to be invariant to inter-class variation.
	Title: Contrastive Domain Disentanglement for Generalizable Medical Image Segmentation Publication: Arxiv 2022 Summary: Propose Contrastive Domain Disentanglement and Style Augmentation (CDDSA) for image segmentation in the fundus and MR images. This method introduce a disentangle network to decompose medical images into an anatomical representation and a modality representation, and a style contrastive loss function is designed to ensures that style representations from the same domain bear similarity while those from different domains diverge significantly.

Generative Learning

Generative models are also effective techniques for traditional feature disentanglement, such as InfoGAN and $\beta$-VAE. For domain generalization, generative learning based disentanglement methods attempt to elucidate the sample generation mechanisms from the perspectives of domain, sample, and label, thereby achieving feature decomposition.

Diagram	Descriptions
	Title: Learning domain-agnostic representation for disease diagnosiss Publication: ICLR 2023 Summary: Leverage structural causal modeling to explicitly model disease-related and center-effects. Guided by this, propose a novel Domain Agnostic Representation Model (DarMo) based on variational Auto-Encoder and design domain-agnostic and domain-aware encoders to respectively capture disease-related features and varied center effects by incorporating a domain-aware batch normalization layer.
	Title: DiMix: Disentangle-and-Mix Based Domain Generalizable Medical Image Segmentation Publication: MICCAI 2023 Summary: Combine vision transformer architectures with style-based generators for cross-site medical segmentation. It learned domain-invariant representations by swapping domain-specific features, facilitating the disentanglement of content and styles.
	Title: DIVA: Domain Invariant Variational Autoencoders Publication: PLMR 2022 Summary: Propose Domain-invariant variational autoencoder (DIVA) for malaria cell image classification, which disentangles the features into domain information, category information, and other information, which is learned in the VAE framework. Code: https://github.com/AMLab-Amsterdam/DIVA
	Title: Variational Disentanglement for Domain Generalization Publication: TMLR 2022 Summary: Propose a Variational Disentanglement Network (VDN) to classify breast cancer metastases. VDN disentangles domain-invariant and domain-specific features by estimating the information gain and maximizing the posterior probability.

Model Training Level

Learning Strategy

Learning strategies have gained significant attention in tackling domain generalization challenges across various fields. They leverage generic learning paradigms to improve model generalization performance, which can be mainly categorized into three categories: ensemble learning, meta-learning, and self-supervised learning.

Ensemble Learning

Ensemble learning is a machine learning technique where multiple models are trained to solve the same problem. For domain generalization, different models can capture domain-specific patterns and representations, so their combination could lead to more robust predictions.

Diagram	Descriptions
	Title: Mixture of calibrated networks for domain generalization in brain tumor segmentation Data Publication: KBS 2023 Summary: Design the mixture of calibrated networks (MCN) for cross-domain brain tumor segmentation, which combines the predictions from multiple models, and each model has unique calibration characteristics to generate diverse and fine-grained segmentation map.
	Title: DeepLesionBrain: Towards a broader deep-learning generalization for multiple sclerosis lesion segmentation Publication: MedIA 2021 Summary: Use a large group of compact 3D CNNs spatially distributed over the brain regions and associate a distinct network with each region of the brain, thereby producing consensus-based segmentation robust to domain shift.
	Title: MS-Net: Multi-Site Network for Improving Prostate Segmentation With Heterogeneous MRI Data Publication: IEEE TMI 2020 Summary: Propose multi-site network (MS-Net) for cross-site prostate segmentation, which consists of a universal network and multiple domain-specific auxiliary branches. The universal network is trained with the supervision of ground truth and transferred multi-site knowledge from auxiliary branches to help explore the general representation. Code: https://github.com/liuquande/MS-Net

Meta Learning

Meta-learning, also known as learning to learn, is a machine learning method focused on designing algorithms that can generalize knowledge from diverse tasks. In medical domain generalization tasks, it plays a significant role in addressing the challenge of expensive data collecting and annotating, which divide the source domain(s) into meta-train and meta-test sets to simulate domain shift.

Diagram

Descriptions

Title: FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space

Publication: CVPR 2021

Summary: Introduce episodic meta-learning for federated medical image segmentation. During the training process of local models, the raw input serves as the meta-train data, while its counterparts generated from frequency space are used as the meta-test data, helping in learning generalizable model parameters.

Code: https://github.com/liuquande/FedDG-ELCFS

Title: Shape-Aware Meta-learning for Generalizing Prostate MRI Segmentation to Unseen Domains

Publication: MICCAI 2020

Summary: Propose a shape-aware meta-learning (SAML) scheme for the prostate MRI segmentation, rooted in gradient-based meta-learning. It explicitly simulates domain shift during training by dividing virtual meta-train and meta-test sets.

Code: https://github.com/liuquande/SAML

Self-supervised Learning

Self-supervised learning is a machine learning method where a model learns general representations from input data without explicit supervision. These representations enhance the model's generalization capability, enabling it to mitigate domain-specific biases. This approach is particularly valuable in scenarios where labeled data is scarce or costly to obtain and annotate, such as in medical imaging.

Diagram

Descriptions

Title: Robust and data-efficient generalization of self-supervised machine learning for diagnostic imaging

Publication: Nature Biomedical Engineering 2023

Summary: Propose robust and efficient medical imaging with self-supervision (REMEDIS) for technology, demographic and behavioral domain shifts, which combines large-scale supervised transfer learning on natural images and intermediate contrastive self-supervised learning on medical images and requires minimal task-specific customization.

Title: Frequency-Mixed Single-Source Domain Generalization for Medical Image Segmentation

Publication: MICCAI 2023

Summary: Leverage frequency-based augmentation technique to extend the single-source domain discrepancy and constructed self-supervision in the single domain augmentation to learn robust context-aware representations for the fundus vessels segmentation.

Code: https://github.com/liamheng/Non-IID_Medical_Image_Segmentation

Optimization Strategy

Optimization strategies play a crucial role in minimizing overfitting to specific domains, which is achieved by adjusting hyperparameters, selecting appropriate loss functions, regularization techniques, and optimization algorithms.

Diagram

Descriptions

Title: Model-Based Domain Generalization

Publication: NeurIPS 2021

Summary: Present a model-based domain generalization framework to rigorously reformulate the domain generalization problem as a semi-infinite constrained optimization problem. employed group distributionally robust optimization (GDRO) for the skin lesion classification model. This optimization involves more aggressive regularization, implemented through a hyperparameter to favor fitting smaller groups, and early stopping techniques to enhance generalization performance.

Code: https://github.com/arobey1/mbdg

Title: DOMINO++: Domain-Aware Loss Regularization for Deep Learning Generalizability

Publication: MICCAI 2023

Summary: Introduce an adaptable regularization framework to calibrate intracranial MRI segmentation models based on expert-guided and data-guided knowledge. The strengths of this regularization lie in its ability to take advantage of the benefits of both the semantic confusability derived from domain knowledge and data distribution.

Model Test Level

Test-time Adaptation

Diagram

Descriptions

Title: UniAda: Domain Unifying and Adapting Network for Generalizable Medical Image Segmentation

Setting: MDG

Publication: IEEE TMI 2024

Summary: Propose a domain Unifying and Adapting network (UniAda) with DFU and UTTA module for generalizable medical image segmentation, a novel "unifying while training, adapting while testing" paradigm that can learn a domain-aware base model during training and dynamically adapt it to unseen target domains during testing. The DFU module unifies multi-source domains into a global inter-source domain through a novel feature statistics update mechanism, capable of sampling new features for previously unseen domains, thus enhancing the training of a domain-aware base model. The UTTA module leverages an uncertainty map to guide the adaptation of the trained model for each testing sample, considering the possibility that the specific target domain may fall outside the global inter-source domain.

Code: https://github.com/ZhouZhang233/UniAda

Title: Single-Domain Generalization in Medical Image Segmentation via Test-Time Adaptation from Shape Dictionary

Setting: SDG

Publication: AAAI 2022

Summary: Present a SDG approach that extracts and integrates the semantic shape prior information of segmentation that are invariant across domains and can be well-captured even from single domain data to facilitate segmentation under distribution shifts. Besides, a test-time adaptation strategy with dual-consistency regularization is further devised to promote dynamic incorporation of these shape priors under each unseen domain to improve model generalizability.

Papers on Universal Foundation Model

Medical image segmentation tasks encompass diverse imaging modalities, such as magnetic resonance imaging (MRI), X-ray, computed tomography (CT), and microscopy; various biomedical domains, including the abdomen, chest, brain, retina, and individual cells; and multiple label types within a region, such as heart valves or chambers. Traditional task-specific models are designed to train and test on a single, specific dataset. In contrast, universal foundation models aim to learn a single, generalizable medical image segmentation model capable of performing well across a wide range of tasks, including those significantly different from those encountered during training, without requiring retraining.

Survey

Diagram

Descriptions

Title: Foundational models in medical imaging: A comprehensive survey and future vision

Publication: Arxiv 2023

Summary: This survey provides an in-depth review of recent advancements in foundational models for medical imaging. It categorizes these models into four main groups, distinguishing between those prompted by text and those guided by visual cues. Each category presents unique strengths and capabilities, which are further explored through exemplary works and comprehensive methodological descriptions. Furthermore, this survey evaluates the advantages and limitations inherent to each model type, highlighting their areas of excellence while identifying aspects requiring improvement.

Repo: https://github.com/xmindflow/Awesome-Foundation-Models-in-Medical-Imaging

Visual Foundation Models

Interactive

Interactive segmentation paradigm means the foundation model segments the target following the user-given prompts, such as a point, a bounding box (BBox), doodles or free text-like descriptions.

Diagram	Descriptions
	Title: Segment anything in medical images Publication: Nature Communications 2024 Summary: Present MedSAM, a foundation model designed for bridging this gap by enabling universal medical image segmentation. The model is developed on a large-scale medical image dataset with 1,570,263 image-mask pairs, covering 10 imaging modalities and over 30 cancer types. Code: https://github.com/bowang-lab/MedSAM
	Title: ScribblePrompt: Fast and Flexible Interactive Segmentation for Any Biomedical Image Publication: ECCV 2024 Summary: Present ScribblePrompt, a flexible neural network based interactive segmentation tool for biomedical imaging that enables human annotators to segment previously unseen structures using scribbles, clicks, and bounding box. ScribblePrompt’s success rests on a set of careful design decisions. These include a training strategy that incorporates both a highly diverse set of images and tasks, novel algorithms for simulated user interactions and labels, and a network that enables fast inference. Code: https://scribbleprompt.csail.mit.edu
	Title: Clustering Propagation for Universal Medical Image Segmentation Publication: CVPR 2024 Summary: Introduce S2VNet a universal framework that leverages Slice-to-Volume propagation to unify automatic/interactive segmentation within a single model and one training session. S2VNet makes full use of the slice-wise structure of volumetric data by initializing cluster centers from the cluster results of the previous slice. This enables knowledge acquired from prior slices to assist in segmenting the current slice further efficiently bridging the communication between remote slices using mere 2D networks. Moreover, such a framework readily accommodates interactive segmentation with no architectural change simply by initializing centroids from user inputs. Code: https://github.com/dyh127/S2VNet
	Title: TP-DRSeg: Improving Diabetic Retinopathy Lesion Segmentation with Explicit Text-Prompts Assisted SAM Publication: MICCAI 2024 Summary: Propose a framework that customizes SAM for text-prompted Diabetic Retinopathy lesion segmentation, termed TP-DRSeg, which exploits language cues to inject medical prior knowledge into the vision-only segmentation network, thereby combining the advantages of different foundation models and enhancing the credibility of segmentation. To unleash the potential of vision-language models in the recognition of medical concepts, it utlizes an explicit prior encoder that transfers implicit medical concepts into explicit prior knowledge, providing explainable clues to excavate low-level features associated with lesions. Furthermore, a prior-aligned injector is designed to inject explicit priors into the segmentation process, which can facilitate knowledge sharing across multi-modality features and allow the framework to be trained in a parameter-efficient fashion. Code: https://github.com/wxliii/TP-DRSeg
	Title: MedCLIP-SAM: Bridging Text and Image Towards Universal Medical Image Segmentation Publication: MICCAI 2024 Summary: Propose MedCLIP-SAM that combines CLIP and SAM models to generate segmentation of clinical scans using text prompts in both zero-shot and weakly supervised settings. To achieve this, it employs a new Decoupled Hard Negative Noise Contrastive Estimation (DHN-NCE) loss to fine-tune the BiomedCLIP model and the recent gScoreCAM to generate prompts to obtain segmentation masks from SAM in a zero-shot setting. Additionally, it explores the use of zero-shot segmentation labels in a weakly supervised paradigm to improve the segmentation quality further. Code: https://github.com/HealthX-Lab/MedCLIP-SAM
	Title: DB-SAM: Delving into High Quality Universal Medical Image Segmentation Publication: MICCAI 2024 Summary: Propose a dual-branch adapted SAM framework, which contains two branches in parallel: a ViT branch and a convolution branch. The ViT branch incorporates a learnable channel attention block after each frozen attention block, which captures domain-specific local features. On the other hand, the convolution branch employs a lightweight convolutional block to extract domain-specific shallow features from the input medical image. To perform cross-branch feature fusion, a bilateral cross-attention block and a ViT convolution fusion block are designed to dynamically combine diverse information of two branches for mask decoder. Code: https://github.com/AlfredQin/DB-SAM
	Title: DB-SAM: Delving into High Quality Universal Medical Image Segmentation Publication: MICCAI 2024 Summary: Propose a dual-branch adapted SAM framework, which contains two branches in parallel: a ViT branch and a convolution branch. The ViT branch incorporates a learnable channel attention block after each frozen attention block, which captures domain-specific local features. On the other hand, the convolution branch employs a lightweight convolutional block to extract domain-specific shallow features from the input medical image. To perform cross-branch feature fusion, a bilateral cross-attention block and a ViT convolution fusion block are designed to dynamically combine diverse information of two branches for mask decoder. Code: https://github.com/AlfredQin/DB-SAM

Few-shot/One-shot

In few-shot/one-shot setting, a pre-trained foundationa model needs one or few labeled samples as the ’supportive examples’, to grasp a new specific task.

Diagram	Descriptions
	Title: One-Prompt to Segment All Medical Images Publication: CVPR 2024 Introduce a new paradigm toward the universal medical image segmentation termed One-Prompt Segmentation which combines the strengths of one-shot and interactive methods. In the inference stage with just one prompted sample it can adeptly handle the unseen task in a single forward pass. Code: https://github.com/KidsWithTokens/one-prompt
	Title: MedUniSeg: 2D and 3D Medical Image Segmentation via a Prompt-driven Universal Model Publication: CVPR 2024 Introduce MedUniSeg, a prompt-driven universal segmentation model designed for 2D and 3D multi-task segmentation across diverse modalities and domains. MedUniSeg employs multiple modal-specific prompts alongside a universal task prompt to accurately characterize the modalities and tasks. To generate the related priors, a modal map (MMap) and the fusion and selection (FUSE) modules are designed, which transform modal and task prompts into corresponding priors. These modal and task priors are systematically introduced at the start and end of the encoding process. Code: https://github.com/yeerwen/UniSeg
	Title: ESP-MedSAM: Efficient Self-Prompting SAM for Universal Domain-Generalized Medical Image Segmentation Publication: Arxiv 2024 Propose an efficient self-prompting SAM for universal domain-generalized medical image segmentation, named ESP-MedSAM. A multi-modal Decoupled Knowledge Distillation (MMDKD) strategy is first designed to construct a lightweight semi-parameter sharing image encoder that produces discriminative visual features for diverse modalities. Further, it introduces the Self-Patch Prompt Generator (SPPG) to generate high-quality dense prompt embeddings for guiding segmentation decoding automatically. Finally, it designed the Query-Decoupled Modality Decoder (QDMD) that leverages a one-to-one strategy to provide an independent decoding channel for every modality. Code: https://github.com/xq141839/ESP-MedSAM
	Title: UniverSeg: Universal Medical Image Segmentation Publication: ICCV 2023 Summary: Present UniverSeg, a universal segmentation method for solving unseen medical segmentation tasks without additional training. Given a query image and an example set of image-label pairs that define a new segmentation task, UniverSeg employs a new CrossBlock mechanism to produce accurate segmentation maps without additional training. What's more, 53 open-access medical segmentation datasets with over 22,000 scans were collected to train UniverSeg on a diverse set of anatomies and imaging modalities. Code: https://universeg.csail.mit.edu

Multimodal Foundation Models

Contrastive

Contrastive textually prompted models are increasingly recognized as foundational models for medical imaging. They learn representations that capture the semantics and relationships between medical images and their corresponding textual prompts. By leveraging contrastive learning objectives, these models bring similar image-text pairs closer in the feature space while pushing dissimilar pairs apart.

Diagram

Descriptions

Title: MM-Retinal: Knowledge-Enhanced Foundational Pretraining with Fundus Image-Text Expertise

Publication: MICCAI 2024

Propose MM-Retinal, a multi-modal dataset that encompasses high-quality image-text pairs collected from professional fundus diagram books. Moreover, enabled by MM-Retinal, present a novel Knowledge-enhanced foundational pretraining model which incorporates Fundus Image-Text expertise. It is designed with image similarity-guided text revision and mixed training strategy to infuse expert knowledge.

Code: https://github.com/lxirich/MM-Retinal

Title: Knowledge-enhanced Visual-Language Pre-training on Chest Radiology Images

Publication: Nature Communications 2023

Propose an approach called Knowledge-enhanced Auto Diagnosis (KAD) which leverages existing medical domain knowledge to guide vision-language pre-training using paired chest X-rays and radiology reports.

Generative

Generative models represent another category within textually prompted models for medical imaging. These models are designed to generate realistic medical images based on textual prompts or descriptions. They utilize techniques such as variational autoencoders (VAEs) and generative adversarial networks (GANs) to learn the underlying distribution of medical images, enabling the creation of new samples that align with the provided prompts.

Diagram

Descriptions

Title: Med-Flamingo: a Multimodal Medical Few-shot Learner

Publication: Arxiv 2023

Summary: Propose Med-Flamingo, a multimodal few-shot learner adapted to the medical domain. Based on OpenFlamingo-9B, it is pre-trained on paired and interleaved medical image-text data from publications and textbooks. Med-Flamingo unlocks few-shot generative medical visual question answering (VQA) abilities.

Repo: https://github.com/snap-stanford/med-flamingo

Conversational

Conversational textually prompted models are designed to enable interactive dialogues between medical professionals and the model by fine-tuning foundational models on specific instruction sets. These models enhance communication and collaboration, allowing medical experts to ask questions, provide instructions, and seek explanations related to medical images.

Diagram

Descriptions

Title: Uni-Med: A Unified Medical Generalist Foundation Model For Multi-Task Learning Via Connector-MoE

Publication: Neurips 2024

Propose Uni-Med, a novel medical generalist foundation model consisting of a universal visual feature extraction module, a connector mixture-of-experts (CMoE) module, and a large language model (LLM). Benefiting from the proposed CMoE, which leverages a well-designed router with a mixture of projection experts at the connector, Uni-Med provides an efficient solution to the tug-of-war problem. It is capable of performing six different medical tasks, including question answering, visual question answering, report generation, referring expression comprehension, referring expression generation, and image classification.

Code: https://github.com/MSIIP/Uni-Med

Datasets

We list the widely used benchmark datasets for domain generalization including classification and segmentation.

Dataset	Task	#Domain	#Class	Description
Fundus OC/OD	Segmentation	4	2	Retinal fundus RGB images from three public datasets, including REFUGE, DrishtiGSand RIM-ONE-r
Prostate MRI	Segmentation	6	1	T2-weighted MRI data collected three public datasets, including NCI-ISBI13, I2CVB and PROMISE12
Abdominal CT & MRI	Segmentation	2	4	30 volumes Computed tomography (CT) and 20 volumes T2 spectral presaturation with inversion recovery (SPIR) MRI
Cardiac	Segmentation	2	3	45 volumes balanced steady-state free precession (bSSFP) MRI and late gadolinium enhanced (LGE) MRI
BraTS	Segmentation	4	1	Multi-contrast MR scans from glioma patients and consists of four different contrasts: T1, T1ce, T2, and FLAIR
M&Ms	Segmentation	4	3	Multi-centre, multi-vendor and multi-disease cardiac image segmentation dataset contains 320 subjects
SCGM	Segmentation	4	1	Single channel spinal cord gray matter MRI from four different centers
Camelyon17	Detection & Classification	5	2	Whole-slide images (WSI) of hematoxylin and eosin (H&E) stained lymph node sections of 100 patients
Chest X-rays	Classification	3	2	Chest X-rays for detecting whether the image corresponds to a patient with Pneumonia from three dataset NIH, ChexPert and RSNA

Libraries

We list the libraries of domain generalization.

Transfer Learning Library (thuml) for Domain Adaptation, Task Adaptation, and Domain Generalization.
DomainBed (facebookresearch) is a suite to test domain generalization algorithms.
DeepDG (Jindong Wang): Deep domain generalization toolkit, which is easier then DomainBed.
Dassl (Kaiyang Zhou): A PyTorch toolbox for domain adaptation, domain generalization, and semi-supervised learning.
TorchSSL (Jindong Wang): A open library for semi-supervised learning.

Other Resources

A collection of domain generalization papers organized by amber0309.
A collection of domain generalization papers organized by jindongwang.
A collection of papers on domain generalization, domain adaptation, causality, robustness, prompt, optimization, generative model, etc, organized by yfzhang114.
A collection of awesome things about domain generalization organized by junkunyuan.

Contact

If you would like to add/update the latest publications / datasets / libraries, please directly add them to this README.md.
If you would like to correct mistakes/provide advice, please contact us by email (nzw@zju.edu.cn).
You are welcomed to update anything helpful.

Acknowledgements

We refer to Generalizing to Unseen Domains: A Survey on Domain Generalization to design the hierarchy of the Contents.
We refer to junkunyuan, amber0309, and yfzhang114 to design the details of the papers and datasets.

Contributors

_ZiweiNiu

_Zerone-fg

Name		Name	Last commit message	Last commit date
Latest commit History 175 Commits
.github/workflows		.github/workflows
images		images
images_copy		images_copy
.DS_Store		.DS_Store
README.md		README.md
"eval \"$(ssh-agent -s)\""		"eval \"$(ssh-agent -s)\""
"eval \"$(ssh-agent -s)\".pub"		"eval \"$(ssh-agent -s)\".pub"
resize_img.py		resize_img.py

Folders and files

Latest commit

History

Repository files navigation

Domain Generalization and Foundation Model in Medical Image Analysis

Table of Contents

Papers on Domian Generalization (ongoing)

Data Manipulation Level

Data Augmentation

Normalization-based

Randomization-based

Adversarial-based

Data Generation

Feature Level Generalization

Invariant Feature Representation

Feature normalization

Explicit feature alignment

Domain adversarial learning

Feature disentanglement

Multi-component learning

Generative Learning

Model Training Level

Learning Strategy

Ensemble Learning

Meta Learning

Self-supervised Learning

Optimization Strategy

Model Test Level

Test-time Adaptation

Papers on Universal Foundation Model

Survey

Visual Foundation Models

Interactive

Few-shot/One-shot

Multimodal Foundation Models

Contrastive

Generative

Conversational

Datasets

Libraries

Other Resources

Contact

Acknowledgements

Contributors

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages