Skip to content

[NeurIPS 2025] This is the official repository for VL-SAE: Interpreting and Enhancing Vision-Language Alignment with a Unified Concept Set

Notifications You must be signed in to change notification settings

ssfgunner/VL-SAE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

[NeurIPS 2025] VL-SAE: Interpreting and Enhancing Vision-Language Alignment with a Unified Concept Set

This repository is the official implementation of VL-SAE, which helps users to understand the vision-language alignment of VLMs via concepts.

Requirements

Create a conda virtual environment and activate it:

conda create -n vlsae python=3.8 -y
conda activate vlsae

Install dependencies:

pip install -r requirements.txt

Dataset preparation

Install CC3M dataset from cc3m-wds, and put it under ./CC3M

Running the provided scripts to preprocess the dataset:

bash cc3m_untar.sh
python cc3m_moving.py
python cc3m_meta.py

Quick Start

Download LLaVA 1.5 and put it under ./lvlms/pretrained_models

For OpenCLIP-ViT-B/32, download the pre-trained VL-SAE weights (SAE weights, metadata) and put it under cvlms/demo.

For LLaVA 1.5, download the the pre-trained VL-SAE (SAE weights, Auxiliary AE weights, metadata) and put it under lvlms/demo.

We present the demo of VL-SAE with OpenCLIP and LLaVA 1.5 in cvlms/demo/demo.ipynb and lvlms/demo/demo.ipynb, respectively.

Moreover, we provide scripts lvlms/demo/demo_inference.ipynb that incorporate the VL-SAE to modify the representations during the inference process of LVLMs.

Pre-trained Models

The pre-trained VL-SAE is provided in ModelScope and HuggingFace.

Base Model ModelScope HuggingFace
OpenCLIP-ViT-B/32 SAE weights, metadata SAE weights, metadata
OpenCLIP-ViT-B/16 SAE weights, metadata SAE weights, metadata
OpenCLIP-ViT-L/14 SAE weights, metadata SAE weights, metadata
OpenCLIP-ViT-H/14 SAE weights, metadata SAE weights, metadata
LLaVA-1.5-7B SAE weights, Auxiliary AE weights, metadata SAE weights, Auxiliary AE weights, metadata

Training

This repo supports the construction of VL-SAE for LLaVA-1.5 and OpenCLIP.

First, collect the hidden representations of pre-trained models:

model_type="cvlms" # for OpenCLIP
# model_type="lvlms" # for LLaVA
cd ./${model_type}/representation_collection
bash get_activations.sh

With a single NVIDIA RTX 4090, this step takes approximately 5 hours for OpenCLIP and 4 days for LLaVA.

Then, train VL-SAE based on the collected representations:

cd ../sae_trainer
bash train.sh

Evaluation

Qualitative Results

Visualize the concepts learned by VL-SAE:

cd ../eval
python visualize_concept.py --topk 256 --ckpt-path ../sae_trainer/sae_weights/openclip_ViT-B-32_VL_SAE_256_8_best.pth

Each concept is represented by a set of images stored in the corresponding folder and sentences in the text_interpretation.txt file.

concept_vis

Quantitive Results

After the visualizations of concepts, their inter-similarity score and intra-similarity score can be computed using CLIP embeddings.

python eval.py --target-dir ./concept_images/vlsae_ViT-B-32_256

concept_eval

Generate Metadata Files for SAE Weights

Finally, generate a JSON file for the trained SAE, which stores the index of each concept, along with its mean activation value, and maximum activation data (image URL, texts). This file is designed to support the integration of VL-SAE into the model inference process for interpretability purposes.

python concept2data.py --topk 256 --ckpt-path ../sae_trainer/sae_weights/openclip_ViT-B-32_VL_SAE_256_8_best.pth

Application: Eliminating Hallucination for LVLMs

Integrate the pre-trained VL-SAE into the inference process of LLaVA 1.5 to eliminate hallucinations.

First, download the validation images & annotations of COCO 2014 and put it under lvlms/VCD/data/coco.

Then, run the provided scripts to evaluate the performance of VL-SAE on different benchmarks.

cd lvlms/VCD/experiments
# For POPE benchmark
bash cd_scripts/llava1.5_pope.sh 
# For CHAIR benchmark
bash cd_scripts/llava1.5_chair.sh

pope

chair

Citation

If you find VL-SAE useful for your research and applications, please cite using this BibTeX:

@misc{shen2025vlsae,
      title={VL-SAE: Interpreting and Enhancing Vision-Language Alignment with a Unified Concept Set}, 
      author={Shufan Shen and Junshu Sun and Qingming Huang and Shuhui Wang},
      year={2025},
      eprint={2510.21323},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2510.21323}, 
}

Related Projects

About

[NeurIPS 2025] This is the official repository for VL-SAE: Interpreting and Enhancing Vision-Language Alignment with a Unified Concept Set

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published