[NeurIPS 2025] With Limited Data for Multimodal Alignment, Let the STRUCTURE Guide You

Fabian Gröger* · Shuo Wen* · Huyen Le · Maria Brbić

Overview

Multimodal models have demonstrated powerful capabilities in complex tasks requiring multimodal alignment, including zero-shot classification and cross-modal retrieval. However, existing models typically rely on millions of paired multimodal samples, which are prohibitively expensive or infeasible to obtain in many domains.

In this work, we explore the feasibility of building multimodal models with limited amounts of paired data by aligning pretrained unimodal foundation models. We show that high-quality alignment is possible with as few as tens of thousands of paired samples — less than 1% of the data typically used in the field.

Key Contributions

STRUCTURE Regularization: An effective technique that preserves the neighborhood geometry of the latent space of unimodal encoders
Layer Selection: Demonstration that aligning last layers is often suboptimal, with benefits from aligning layers with highest representational similarity
Strong Results: 51.6% average relative improvement in classification and 91.8% in retrieval tasks across 24 benchmarks
Broad Applicability: Can be readily incorporated into existing alignment methods

Installation

Requirements

Python 3.8+
PyTorch 2.1.2+
CUDA 11.8+ (for GPU support)

Setup

Clone the repository:

git clone https://github.com/mlbio-epfl/STRUCTURE.git
cd STRUCTURE

Install dependencies:

pip install -r requirements.txt

Quick Start

1. Data Preparation

Prepare your datasets using the provided scripts. For example, for COCO:

# COCO dataset will be downloaded automatically
# Place in ./data/coco/

For other datasets, see the src/dataset_preparation/ directory for preparation scripts.

2. Training

Train an alignment model with limited data:

python src/train_subsampled_alignment.py --config_path configs/losses_lin/clip_base_best.yaml

Train with full alignment:

python src/train_alignment.py --config_path configs/losses_lin/clip_base_best.yaml

3. Evaluation

Extract features:

python src/extract_features.py --config_path configs/default.yaml

Measure alignment quality:

python src/measure_alignment.py --config_path configs/metrics/clip_mutual_knn_rice.yaml

Configuration

The repository uses YAML configuration files located in configs/:

configs/default.yaml - Base configuration
configs/losses_lin/ - Linear alignment layer configurations
configs/losses_mlp/ - MLP alignment layer configurations
configs/ablations/ - Ablation study configurations
configs/csa/ - CSA configurations
configs/metrics/ - Alignment metrics configurations

Project Structure

representation-alignment/
├── configs/              # Configuration files
├── src/
│   ├── alignment/       # Alignment layer implementations
│   ├── trainers/        # Training logic
│   ├── models/          # Model implementations
│   ├── loss/            # Loss functions
│   ├── evaluation/      # Evaluation metrics
│   ├── dataset_preparation/  # Dataset preparation scripts
│   └── utils/           # Utility functions
├── data/                # Dataset directory (created during setup)
└── requirements.txt     # Python dependencies

Citation

If you find this work useful, please cite our paper:

@inproceedings{groger2025structure,
  title={With Limited Data for Multimodal Alignment, Let the {STRUCTURE} Guide You},
  author={Gr{\"o}ger, Fabian and Wen, Shuo and Le, Huyen and Brbic, Maria},
  booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
  year={2025},
  url={https://openreview.net/forum?id=IkvQqD7hk3}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets		assets
configs		configs
src		src
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.flake8		.flake8
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.yamllint		.yamllint
Dockerfile		Dockerfile
Makefile		Makefile
Readme.md		Readme.md
common.mk		common.mk
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[NeurIPS 2025] With Limited Data for Multimodal Alignment, Let the STRUCTURE Guide You

Overview

Key Contributions

Installation

Requirements

Setup

Quick Start

1. Data Preparation

2. Training

3. Evaluation

Configuration

Project Structure

Citation

About

Uh oh!

Releases

Packages

Languages

mlbio-epfl/STRUCTURE

Folders and files

Latest commit

History

Repository files navigation

[NeurIPS 2025] With Limited Data for Multimodal Alignment, Let the STRUCTURE Guide You

Overview

Key Contributions

Installation

Requirements

Setup

Quick Start

1. Data Preparation

2. Training

3. Evaluation

Configuration

Project Structure

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages