autoXplain: A VLM-based framework to automatically explain vision model with ease

autoXplain is a framework that combines Vision Language Models (VLMs) with various Class Activation Mapping (CAM) methods to automatically explain and evaluate vision model predictions. It provides detailed explanations, saliency maps, and quantitative evaluations of model performance.

Installation

Install the autoXplain package:

pip install git+https://github.com/phuvinhnguyen/autoXplain.git

Or you can clone and install it

git clone https://github.com/phuvinhnguyen/autoXplain.git
cd autoXplain
pip install -e .

Features

Multiple CAM methods support:
- GradCAM
- SmoothGradCAM++
- GradCAM++
- CAM
- ScoreCAM
- LayerCAM
- XGradCAM
Automatic evaluation using Vision Language Models (VLMs)
Batch processing of images
Comprehensive result analysis and reporting
Support for different vision models (ResNet18, MaxViT)
Detailed performance metrics and visualizations

Usage

Basic Usage

from autoXplain.evaluating import CamJudge
from FlowDesign.litellm import LLMInference
from torchcam.methods import GradCAM
from torchvision.models import resnet18
import torchvision
import json
import urllib.request

# Load labels of ImageNet models
url = "https://storage.googleapis.com/download.tensorflow.org/data/imagenet_class_index.json"
with urllib.request.urlopen(url) as response:
    class_idx = json.load(response)
labels = [class_idx[str(i)][1] for i in range(1000)]

# Load LLM
bot = LLMInference("gemini/gemini-1.5-flash", api_key='<API_TOKEN>')

# Load vision model
model = resnet18(weights=torchvision.models.ResNet18_Weights.IMAGENET1K_V1)

# Create workflow
agent = CamJudge(bot, GradCAM, model, labels=labels)

# Run framework
output = agent({'image': 'path/to/image.png', 'label': 'label_of_the_image'})

print(output)

Batch Processing

For processing multiple images and generating XAI confusion matrix, use the examples/process_folder.py script:

python examples/process_folder.py input_folder \
    --save_dir autoXplain_results \
    --model resnet18 \
    --cam_type gradcam \
    --threshold 2.5 \
    --vlm_model gemini/gemini-1.5-flash \
    --api_key YOUR_API_KEY_1,YOUR_API_KEY_2

Command Line Arguments

input_folder: Path to folder containing images (required)
--save_dir: Path to save processed results (default: 'autoXplain_results')
--model: Model to use (choices: 'resnet18', 'maxvit_t', ..., default: 'resnet18')
--cam_type: Type of CAM method to use (choices: 'gradcam', 'smoothgradcam', 'gradcamplusplus', 'cam', 'scorecam', 'layercam', 'xgradcam', default: 'gradcam')
--threshold: Threshold for VLM score (higher means good, default: 2.5)
--vlm_model: name of VLM used for evaluating (default: 'gemini/gemini-1.5-flash')
--api_key: Google API key for Gemini model (required)

Image Naming Convention

Images should be named in the format: id_label.extension Example: 001_cat.jpg, 002_dog.png

Output

The framework provides comprehensive outputs:

For Single Image Processing

Saliency map
Masked CAM image
Description
Justification
Score
Prediction

For Batch Processing

All individual image results
Final report categorizing results into four cases:
- Correct predictions with high VLM score
- Correct predictions with low VLM score
- Wrong predictions with high VLM score
- Wrong predictions with low VLM score
Summary statistics
Detailed analysis of each case

Pipeline

The pipeline follows these steps:

Takes model and images as input
Computes attention (saliency map) using the selected CAM method
Uses VLMs to evaluate and score samples
Computes confusion matrix of VLMs' judgment and accuracy
Generates comprehensive reports and visualizations

Citation

If you want to use this work in your research, please cite the following paper:

@article{nguyen2025novel,
  title={A Novel Framework for Automated Explain Vision Model Using Vision-Language Models},
  author={Nguyen, Phu-Vinh and Pham, Tan-Hanh and Ngo, Chris and Hy, Truong Son},
  journal={arXiv preprint arXiv:2508.20227},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
autoXplain		autoXplain
examples		examples
.gitignore		.gitignore
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

autoXplain: A VLM-based framework to automatically explain vision model with ease

Installation

Features

Usage

Basic Usage

Batch Processing

Command Line Arguments

Image Naming Convention

Output

For Single Image Processing

For Batch Processing

Pipeline

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

autoXplain: A VLM-based framework to automatically explain vision model with ease

Installation

Features

Usage

Basic Usage

Batch Processing

Command Line Arguments

Image Naming Convention

Output

For Single Image Processing

For Batch Processing

Pipeline

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages