🔍 Qwen2.5-VL Attention Visualization Tool

A comprehensive tool for visualizing attention patterns in Qwen2.5-VL vision-language model during text generation. This tool allows you to see exactly which parts of an image the model focuses on when generating each token.

📋 Requirements

Python 3.8+
CUDA-capable GPU (recommended, 8GB+ VRAM for 3B model)
16GB+ RAM

🚀 Installation

1. Clone the Repository

cd Qwen_VL_2_5_Visualizer

2. Create Virtual Environment (Recommended)

# Using conda
conda create -n qwen_viz python=3.10
conda activate qwen_viz

# Or using venv
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

3. Install Dependencies

pip install -r requirements.txt

📖 Usage

Quick Start

python app.py

Then open your browser and navigate to: http://127.0.0.1:7861

Edit config.py to customize:

# Model settings
MODEL_NAME = "Qwen/Qwen2.5-VL-3B-Instruct"

# Attention extraction
EXTRACT_ALL_LAYERS = True  # Or specify SPECIFIC_LAYERS

# Visualization
HEATMAP_COLORMAP = "jet"  # 'hot', 'viridis', 'plasma', etc.
HEATMAP_ALPHA = 0.5  # Transparency (0-1)

# Memory optimization
STORE_ON_CPU = True  # Offload attention to CPU
USE_FLOAT16 = True  # Use half precision

🔧 How It Works

Architecture Overview

┌─────────────────────────────────────────────────────────────┐
│                     Qwen2.5-VL Model                        │
│  ┌──────────────────┐         ┌──────────────────┐        │
│  │  Vision Encoder  │────────▶│  Language Model   │        │
│  │  (Patch Embed +  │         │  (Decoder Layers) │        │
│  │   Transformer)   │         │                   │        │
│  └──────────────────┘         └──────────────────┘        │
│           │                            │                    │
│           │                     ┌──────▼──────┐           │
│           │                     │  Attention  │◀──────────┤
│           │                     │   Weights   │  Hooks    │
│           │                     └─────────────┘           │
└───────────┼──────────────────────────┬───────────────────┘
            │                          │
            ▼                          ▼
    ┌───────────────┐         ┌──────────────────┐
    │ Image Patches │         │ Attention Maps   │
    │  Coordinates  │         │  (per token)     │
    └───────┬───────┘         └────────┬─────────┘
            │                          │
            └────────────┬─────────────┘
                         ▼
              ┌──────────────────────┐
              │  Visualization       │
              │  (Heatmap Overlay)   │
              └──────────────────────┘

Key Components

Attention Extractor (attention_extractor.py)
- Registers forward hooks on decoder attention modules
- Captures attention weights during generation
- Stores per-layer, per-head attention
Attention Processor (attention_processor.py)
- Maps vision tokens to image patch positions
- Aggregates attention across layers/heads
- Creates 2D attention maps
Visualizer (visualization.py)
- Generates heatmap overlays
- Supports multiple colormaps and transparency
- Creates comparison views

📄 License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.claude		.claude
.gitignore		.gitignore
README.md		README.md
app.py		app.py
attention_extractor.py		attention_extractor.py
attention_processor.py		attention_processor.py
config.py		config.py
inference.py		inference.py
requirements.txt		requirements.txt
utils.py		utils.py
visualization.py		visualization.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔍 Qwen2.5-VL Attention Visualization Tool

📋 Requirements

🚀 Installation

1. Clone the Repository

2. Create Virtual Environment (Recommended)

3. Install Dependencies

📖 Usage

Quick Start

🔧 How It Works

Architecture Overview

Key Components

📄 License

🙏 Acknowledgments

📚 References

📧 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🔍 Qwen2.5-VL Attention Visualization Tool

📋 Requirements

🚀 Installation

1. Clone the Repository

2. Create Virtual Environment (Recommended)

3. Install Dependencies

📖 Usage

Quick Start

🔧 How It Works

Architecture Overview

Key Components

📄 License

🙏 Acknowledgments

📚 References

📧 Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages