[NeurIPS 2025] VL-SAE: Interpreting and Enhancing Vision-Language Alignment with a Unified Concept Set

This repository is the official implementation of VL-SAE, which helps users to understand the vision-language alignment of VLMs via concepts.

Requirements

Create a conda virtual environment and activate it:

conda create -n vlsae python=3.8 -y
conda activate vlsae

Install dependencies:

pip install -r requirements.txt

Dataset preparation

Install CC3M dataset from cc3m-wds, and put it under ./CC3M

Running the provided scripts to preprocess the dataset:

bash cc3m_untar.sh
python cc3m_moving.py
python cc3m_meta.py

Quick Start

Download LLaVA 1.5 and put it under ./lvlms/pretrained_models

For OpenCLIP-ViT-B/32, download the pre-trained VL-SAE weights (SAE weights, metadata) and put it under cvlms/demo.

For LLaVA 1.5, download the the pre-trained VL-SAE (SAE weights, Auxiliary AE weights, metadata) and put it under lvlms/demo.

We present the demo of VL-SAE with OpenCLIP and LLaVA 1.5 in cvlms/demo/demo.ipynb and lvlms/demo/demo.ipynb, respectively.

Moreover, we provide scripts lvlms/demo/demo_inference.ipynb that incorporate the VL-SAE to modify the representations during the inference process of LVLMs.

Pre-trained Models

The pre-trained VL-SAE is provided in ModelScope and HuggingFace.

Base Model	ModelScope	HuggingFace
OpenCLIP-ViT-B/32	SAE weights, metadata	SAE weights, metadata
OpenCLIP-ViT-B/16	SAE weights, metadata	SAE weights, metadata
OpenCLIP-ViT-L/14	SAE weights, metadata	SAE weights, metadata
OpenCLIP-ViT-H/14	SAE weights, metadata	SAE weights, metadata
LLaVA-1.5-7B	SAE weights, Auxiliary AE weights, metadata	SAE weights, Auxiliary AE weights, metadata

Training

This repo supports the construction of VL-SAE for LLaVA-1.5 and OpenCLIP.

First, collect the hidden representations of pre-trained models:

model_type="cvlms" # for OpenCLIP
# model_type="lvlms" # for LLaVA
cd ./${model_type}/representation_collection
bash get_activations.sh

With a single NVIDIA RTX 4090, this step takes approximately 5 hours for OpenCLIP and 4 days for LLaVA.

Then, train VL-SAE based on the collected representations:

cd ../sae_trainer
bash train.sh

Evaluation

Qualitative Results

Visualize the concepts learned by VL-SAE:

cd ../eval
python visualize_concept.py --topk 256 --ckpt-path ../sae_trainer/sae_weights/openclip_ViT-B-32_VL_SAE_256_8_best.pth

Each concept is represented by a set of images stored in the corresponding folder and sentences in the text_interpretation.txt file.

Quantitive Results

After the visualizations of concepts, their inter-similarity score and intra-similarity score can be computed using CLIP embeddings.

python eval.py --target-dir ./concept_images/vlsae_ViT-B-32_256

Generate Metadata Files for SAE Weights

Finally, generate a JSON file for the trained SAE, which stores the index of each concept, along with its mean activation value, and maximum activation data (image URL, texts). This file is designed to support the integration of VL-SAE into the model inference process for interpretability purposes.

python concept2data.py --topk 256 --ckpt-path ../sae_trainer/sae_weights/openclip_ViT-B-32_VL_SAE_256_8_best.pth

Application: Eliminating Hallucination for LVLMs

Integrate the pre-trained VL-SAE into the inference process of LLaVA 1.5 to eliminate hallucinations.

First, download the validation images & annotations of COCO 2014 and put it under lvlms/VCD/data/coco.

Then, run the provided scripts to evaluate the performance of VL-SAE on different benchmarks.

cd lvlms/VCD/experiments
# For POPE benchmark
bash cd_scripts/llava1.5_pope.sh 
# For CHAIR benchmark
bash cd_scripts/llava1.5_chair.sh

Citation

If you find VL-SAE useful for your research and applications, please cite using this BibTeX:

@misc{shen2025vlsae,
      title={VL-SAE: Interpreting and Enhancing Vision-Language Alignment with a Unified Concept Set}, 
      author={Shufan Shen and Junshu Sun and Qingming Huang and Shuhui Wang},
      year={2025},
      eprint={2510.21323},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2510.21323}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
CC3M		CC3M
cvlms		cvlms
figure		figure
lvlms		lvlms
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[NeurIPS 2025] VL-SAE: Interpreting and Enhancing Vision-Language Alignment with a Unified Concept Set

Requirements

Dataset preparation

Quick Start

Pre-trained Models

Training

Evaluation

Qualitative Results

Quantitive Results

Generate Metadata Files for SAE Weights

Application: Eliminating Hallucination for LVLMs

Citation

Related Projects

About

Uh oh!

Releases

Packages

Languages

ssfgunner/VL-SAE

Folders and files

Latest commit

History

Repository files navigation

[NeurIPS 2025] VL-SAE: Interpreting and Enhancing Vision-Language Alignment with a Unified Concept Set

Requirements

Dataset preparation

Quick Start

Pre-trained Models

Training

Evaluation

Qualitative Results

Quantitive Results

Generate Metadata Files for SAE Weights

Application: Eliminating Hallucination for LVLMs

Citation

Related Projects

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages