[WACV 2026 🔥] ClusterMine: Robust Label-Free Visual Out-Of-Distribution Detection via Concept Mining from Text Corpora

[ 📜 Arxiv] [ 📖 BibTeX] [ 🚀 Quick Start]

Nikolas Adaloglou, Diana Petrusheva, Mohamed Asker, Felix Michels and Prof. Markus Kollmann

Mathematical modeling of biological systems lab (MMBS), Heinrich Heine University of Dusseldorf

An overview of the label mining framework for OOD detection using CLIP. Given a text corpus and its representation, ClusterMine aims to extract in-distribution-related class names in the shared vision-language space of CLIP. Best viewed in color.

TL;DR: This repository contains the official implementation of ClusterMine, a novel method for visual out-of-distribution (OOD) detection that leverage CLIP's text-image embedding space for mining label names from large text corpora.

Abstract and Method Overview

Large-scale visual out-of-distribution (OOD) detection has witnessed remarkable progress by leveraging vision-language models such as CLIP. However, a significant limitation of current methods is their reliance on a pre-defined set of in-distribution (ID) ground-truth label names (positives). These fixed label names can be unavailable, unreliable at scale, or become less relevant due to in-distribution shifts after deployment. Towards truly unsupervised OOD detection, we utilize widely available text corpora for positive label mining, bypassing the need for positives. In this paper, we utilize widely available text corpora for positive label mining under a general concept mining paradigm. Within this framework, we propose ClusterMine, a novel positive label mining method. ClusterMine is the first method to achieve state-of-the-art OOD detection performance without access to positive labels. It extracts positive concepts from a large text corpus by combining visual-only sample consistency (via clustering) and zero-shot image-text consistency.

🗺️ Cluster-based positive mining (ClusterMine)

The proposed method, cluster-based positive mining (ClusterMine), consists of the following steps:

Visual feature-based clustering: We perform clustering on the visual encoder of CLIP using $C=4000$ clusters. In practice, we apply TEMI clustering as it has shown significant improvements in clustering accuracy over $k$-means, even at large scales. We use the default parameters ($\beta=0.6$, 50 heads). In contrast to the clustering downstream task, we are only interested in a rough overestimation of $C$ and not in the ground truth classes.
Vision-language inference: For all samples that fall into the same cluster, we apply zero-shot inference using the text corpus $\mathcal{Y}_{corpus}$.
Cluster Voting: Each cluster's label name is then determined by applying majority voting, effectively reducing the false positive classes. The latter enforces visual consistency, as the nearest neighbors in feature space likely share the same label. Crucially, because different clusters can be mapped to the same label name, $$|\mathcal{Y}_{pos}| \leq C$$.

⭐ Advantages of ClusterMine and Main Results

By integrating visual consistency into the top-1 image-text concept from $\mathcal{Y}_{corpus}$:

Different clusters can be mapped to the same label name $y \in \mathcal{Y}_{pos}$,
Text concepts that do not match the samples' neighborhood are rejected.

Thus, $|\mathcal{Y}_{pos}|$ becomes relatively insensitive as C increases far above the real semantic categories. In practice, an overestimation of the real semantic categories for selecting $C$ is possible even with minimal to no domain knowledge. Compared to MCM, ClusterMine leverages a corpus to decide which concepts are positive and negative. Compared to NegLabel, ClusterMine extracts the positives and implicitly defines the negative concepts without a pre-defined explicit threshold, which is hard to determine a priori.

Experimental results demonstrate that ClusterMine achieves state-of-the-art robustness to covariate in-distribution shifts.

Semantic large-scale OOD detection AUROCs/FPR95 per dataset using CLIP ViT-H dfn5b. The WordNet corpus is used (nouns and adjectives). All methods do not require training or fine-tuning.

Additionally, the extracted positive class names have high overlap with the ground truth labels. On the left plot below, we calculate top-1 text-text similarity with GT, and find the shortest path (minimum amount of hops) to GT in the WordNet tree.

On the right plot below, we measure the OOD detection robustness across multiple ID shifts (x-axis) compared to ImageNet using CLIP ViT-H dfn5b. The relative AUROC difference in % of each method compared to its ImageNet score is shown on top of each bar.

Quick Start 🚀

1. Environment Setup

Choose between conda or uv for dependency management:

Option A: Conda environment (Recommended)

# Create environment from YAML file
conda env create -f environment.yml

# Activate the environment
conda activate clustermine

Option B: uv environment (Faster alternative)

# Install dependencies using uv
uv sync

# Activate the virtual environment
source .venv/bin/activate

Note: The conda environment includes CUDA-optimized packages (PyTorch, FAISS-GPU) which are essential for efficient computation with large embeddings.

2. Dataset Configuration

Before running experiments, configure your dataset paths by editing dataset_loaders/paths.json:

{
  "DEFAULT_PATH": "/path/to/your/datasets",
  "PRECOMPUTED_PATH": "/path/to/store/embeddings",
  "IMAGENET_PATH": "/path/to/imagenet",
  "PRECOMPUTED_TEXT_PATH": "./data/text_embeddings",
  "BASE_PATH_CORPORA": "./data/corpora"
}

Required Datasets for Result Reproduction:

In-distribution: ImageNet-1K (IN1K)
OOD Benchmarks: NINCO (NINCO), ImageNet-O (IN_O), OpenImages-O (openimage_o), iNaturalist (inat), ImageNet-21K-OOD (IN21OOD), Textures subset (texturev2)
Robustness Benchmarks: ImageNet-V2 (IN_V2), ImageNet-A (IN_A), ImageNet-R (IN_R), ImageNet-C subset (IN_C), ImageNet-Sketch (sketch)

The paths of these vision dataset can be configured in the file dataset_loaders/data_paths.py We assume that the path DEFAULT_PATH has these dataset folders. You need to download the data by yourself. For instance, this repo provided intructions for downloading inat. NINCO can be downlaoded directly from zenodo.

3. Generate Text and Image Embeddings

Text Embeddings Generation

Our method uses pre-computed text embeddings from different corpora. We primarily use clip:8 which corresponds to CLIP ViT-H-14 DFN-5B architecture. Here WN corresponds to WordNet nouns only and WN-NA includes nouns and adjectives. In the paper we report results with WN-NA unless otherwise specified.

To compare with other baselines that assume access to the in-distribution label names you also need to compute the ImageNet-1K text embeddings.

# Generate text embeddings for WordNet nouns
python gen_text_embeds.py --arch clip:8 --corpus_name WN

# Generate text embeddings for WordNet nouns + adjectives 
python gen_text_embeds.py --arch clip:8 --corpus_name WN-NA

# Generate ImageNet-1K class embeddings (required for baseline methods like MCM)
python gen_text_embeds.py --arch clip:8 --corpus_name IN1K

Compute Image Embeddings

Pre-computing image embeddings without augmentation significantly speeds up experiments:

# Example: Generate embeddings for NINCO dataset
python gen_embeds.py --arch clip:8 --dataset NINCO --no_eval_knn --no_compute_knn

# For multiple datasets, use the batch script
bash bash/gen_img_emb.sh

Parameters Explanation:

--no_eval_knn: Skip KNN evaluation during embedding generation
--no_compute_knn: Skip KNN index computation (saves time and storage)

4. Download Pre-trained Clustering Weights

ClusterMine requires the TEMI clustering head. We provide weights for clip:8 trained on ImageNet-1K:

# Create weights directory
mkdir -p weights

# Download and extract clustering weights (~160MB)
cd weights && wget https://uni-duesseldorf.sciebo.de/s/jZ7dwn7EmKJxxmG/download/clip_8.zip
unzip clip_8.zip
rm clip_8.zip
cd ..

Refer to the TEMI repository instructions for training your own clustering head.

Custom Clustering Path: We store the weights in the repository folder. To use a different path, modify CLUSTER_BASE_PATH = './weights' in ood/clustermine.py.

🧪 Running Experiments

Out-of-Distribution Detection

The main experimental results on OOD detection evaluates the AUROC and FPR95 on six main OOD benchmarks: ['NINCO', 'IN_O', 'openimage_o', 'inat', 'IN21OOD', 'texturev2']. The averages correspond to these 6 datasets scores using IN1K as the in-distribution.

ClusterMine (Our Main Method)

python main_ood.py --arch clip:8 --dataset "IN1K" \
  --corpus_name "WN" --method   "clustermine" --out_dim 4000 \
  --step 1024 --save_path_df "./data/results/clustermine" ;

Key Parameters:

--out_dim 4000: Number of clustering dimensions (must match downloaded weights)
--step 1024: Batch size for processing (adjust based on GPU memory)
--corpus_name "WN": Corpus for mining concepts

PosMine

python main_ood.py \
    --arch clip:8 \
    --dataset "IN1K" \
    --corpus_name "WN" \
    --method "posmine" \
    --threshold 0.00008 \
    --save_path_df "./data/results/posmine"

Key Parameters:

--threshold 0.00008: Similarity threshold for positive mining that corresponds to the minimum percentage of samples per concept for the concept to be considered in-distribution.

Benchmark All Methods

Run ClusterMine, PosMine, and baseline methods (MCM, NegLabel) together:

python main_ood.py \
    --arch clip:8 \
    --dataset "IN1K" \
    --corpus_name "WN" \
    --method "all" \
    --save_path_df "./data/results/all_methods"

Baseline Methods Included:

MCM: Maximum Concept Matching
NegLabel: Negative label mining

Evaluation of Robustness to In-Distribution Shifts

Test OOD detection performance across distribution shifts:

Step 1: Generate Robustness Dataset Embeddings

# Required robustness datasets
DATASETS=("IN_V2" "IN_A" "IN_R" "IN_C" "sketch")

# Generate embeddings for each dataset
for dataset in "${DATASETS[@]}"; do
    python gen_embeds.py \
        --arch clip:8 \
        --dataset $dataset \
        --no_eval_knn \
        --no_compute_knn \
        --test_only
done

Dataset Descriptions: IN_V2: ImageNet-V2, IN_A: ImageNet-A (adversarial examples), IN_R: ImageNet-R (artistic renditions), IN_C: ImageNet-C (corrupted images), sketch: ImageNet-Sketch (sketch drawings)

Step 2: Run Robustness Evaluation

python main_robust.py \
    --arch clip:8 \
    --dataset IN1K \
    --save_path_df "./data/results/robustness" \
    --step 1024 \
    --corpus_name "WN-NA"

📊 Expected Results using the `WN-NA` corpus

After running the experiments, you should see results similar to our paper (using our processed WN-NA corpus):

Method	NINCO	IN_O	openimage_o	inat	IN21OOD	texturev2	Average
PosMine	92.56	93.13	97.04	98.83	91.36	93.81	94.46
ClusterMine	92.87	93.57	96.93	99	91.53	93.45	94.56
NegLabel	90.26	90.1	95.79	98.64	88.4	90.44	92.27
MCM	88.78	91.3	96.64	96.62	89.65	91.75	92.46

Results are saved as CSV files in your specified --save_path_df directory. Below we show the results wrt robustness to ID shifts.

Method	ID	NINCO	IN_O	openimage_o	inat	IN21OOD	texturev2	Average
ClusterMine	`IN_A`	83.53	84.45	91.88	97.17	80.04	83.46	86.76
NegLabel	`IN_A`	79.7	80.82	90.74	96.64	78.14	80.56	84.43
MCM	`IN_A`	62.95	68.79	84.49	83.75	64.15	69.05	72.2
MCM	`IN_C`	70.01	73.9	84.58	83.76	70.76	74.43	76.24
NegLabel	`IN_C`	82.68	83.18	91.54	96.49	80.95	83.17	86.33
ClusterMine	`IN_C`	86.4	87.02	92.81	97.11	83.75	86.54	88.94
NegLabel	`IN_R`	89.3	89.48	95.44	98.59	87.71	89.68	91.7
MCM	`IN_R`	77.67	81.64	91.08	90.74	78.77	82.11	83.66
ClusterMine	`IN_R`	88.34	89.13	94.5	98.17	85.91	88.61	90.78
MCM	`IN_V2`	84.72	87.79	94.79	94.66	85.66	88.21	89.31
NegLabel	`IN_V2`	87.91	87.97	94.6	98.17	86.09	88.22	90.49
ClusterMine	`IN_V2`	90.7	91.39	95.65	98.49	88.85	91.1	92.7
MCM	`sketch`	84.88	87.73	94.18	94.01	85.77	88.22	89.13
NegLabel	`sketch`	91.21	91.11	96.31	98.85	89.5	91.41	93.06
ClusterMine	`sketch`	90.35	91.14	95.6	98.52	88.43	90.76	92.47

For some datasets, using WN leads to superior results!

🛠️ 📈 Troubleshooting and Extending the Framework

CUDA Out of Memory: Reduce --step parameter or use smaller batch sizes
Missing Dataset Paths: Ensure dataset_loaders/paths.json points to correct locations.

Adding New Datasets

Add dataset path to DATASET_PATHS in dataset_loaders/data_paths.py
Generate embeddings: python gen_embeds.py --arch clip:8 --dataset YOUR_DATASET
Run evaluation: python main_ood.py --dataset IN1K --save_path_df "./results/your_experiment"

Using Different CLIP Models

Supported architectures in model_builders/backbones/:

preset_models = {
    "clip:0"  : ('MobileCLIP-S1', 'datacompdr'),
    "clip:1"  : ('MobileCLIP-B', 'datacompdr'),
    "clip:2"  : ('ViT-B-16', 'openai'),
    "clip:3"  : ('ViT-B-16-SigLIP', 'webli'),
    "clip:4"  : ('ViT-L-14', 'metaclip_400m'),
    "clip:5"  : ('ViT-L-14', 'openai'),
    "clip:6"  : ('ViT-L-16-SigLIP-256', 'webli'),
    "clip:7"  : ('ViT-H-14', 'metaclip_fullcc'),
    "clip:8"  : ('ViT-H-14', 'dfn5b'), 
    "clip:9"  : ('ViT-bigG-14', 'laion2b_s39b_b160k'),
    "clip:10" : ('ViT-bigG-14', 'metaclip_fullcc'),
    "clip:11" : ('ViT-bigG-14-CLIPA', 'datacomp1b')
}

PS:For each new CLIP model you need to produce the image and text embeddings!

Citation 📄

If you find our work useful, please consider citing us in your work:

@inproceedings{adaloglou2026clustermine,
    author    = {Adaloglou, Nikolaos and Diana Petrusheva and Mohamed Asker and Michels, Felix and Kollmann, Markus},
    title     = {ClusterMine: Robust Label-Free Visual Out-Of-Distribution Detection via Concept Mining from Text Corpora},
    booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)},
    year      = {2026}
}

This project is licensed under the Apache License Version 2.0 - see the LICENSE file for details. Take into account the Licences of the previous work, such as WordNet and OOD datasets like NINCO.

If you liked our work and find it useful, please consider (★) starring it, so that it can reach a broader audience of like-minded people. It would be highly appreciated!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[WACV 2026 🔥] ClusterMine: Robust Label-Free Visual Out-Of-Distribution Detection via Concept Mining from Text Corpora

Abstract and Method Overview

🗺️ Cluster-based positive mining (ClusterMine)

⭐ Advantages of ClusterMine and Main Results

Quick Start 🚀

1. Environment Setup

Option A: Conda environment (Recommended)

Option B: uv environment (Faster alternative)

2. Dataset Configuration

3. Generate Text and Image Embeddings

Text Embeddings Generation

Compute Image Embeddings

4. Download Pre-trained Clustering Weights

🧪 Running Experiments

Out-of-Distribution Detection

ClusterMine (Our Main Method)

PosMine

Benchmark All Methods

Evaluation of Robustness to In-Distribution Shifts

Step 1: Generate Robustness Dataset Embeddings

Step 2: Run Robustness Evaluation

📊 Expected Results using the `WN-NA` corpus

🛠️ 📈 Troubleshooting and Extending the Framework

Adding New Datasets

Using Different CLIP Models

Citation 📄

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
baselines		baselines
bash		bash
data/corpora		data/corpora
dataset_loaders		dataset_loaders
eval		eval
figs		figs
model_builders		model_builders
ood		ood
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
gen_embeds.py		gen_embeds.py
gen_text_embeds.py		gen_text_embeds.py
main_ood.py		main_ood.py
main_robust.py		main_robust.py
pyproject.toml		pyproject.toml
utils.py		utils.py

License

HHU-MMBS/clustermine_wacv_official

Folders and files

Latest commit

History

Repository files navigation

[WACV 2026 🔥] ClusterMine: Robust Label-Free Visual Out-Of-Distribution Detection via Concept Mining from Text Corpora

Abstract and Method Overview

🗺️ Cluster-based positive mining (ClusterMine)

⭐ Advantages of ClusterMine and Main Results

Quick Start 🚀

1. Environment Setup

Option A: Conda environment (Recommended)

Option B: uv environment (Faster alternative)

2. Dataset Configuration

3. Generate Text and Image Embeddings

Text Embeddings Generation

Compute Image Embeddings

4. Download Pre-trained Clustering Weights

🧪 Running Experiments

Out-of-Distribution Detection

ClusterMine (Our Main Method)

PosMine

Benchmark All Methods

Evaluation of Robustness to In-Distribution Shifts

Step 1: Generate Robustness Dataset Embeddings

Step 2: Run Robustness Evaluation

📊 Expected Results using the WN-NA corpus

🛠️ 📈 Troubleshooting and Extending the Framework

Adding New Datasets

Using Different CLIP Models

Citation 📄

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

📊 Expected Results using the `WN-NA` corpus

Packages