Query-Free Adversarial Attacks on Image-to-Image Models

This repository implements query-free adversarial attacks on image-to-image diffusion models by perturbing publicly available image encoders. We demonstrate that small, imperceptible perturbations ($\epsilon \geq 8/255$) can severely degrade generation quality across multiple architectures, and surprisingly, neither improved training data nor robust encoders provide effective defense.

📄 Full Paper

📑 Query-free Attacks on Image-to-image Models are Hard to Avoid (PDF)

Abstract: We evaluate query-free adversarial attacks targeting image-to-image generative models across different encoder architectures, specifically Variational Autoencoders (VAE) and CLIP image encoders. We investigate the perturbation budget ($\epsilon$) required to effectively induce low-quality generations. We observe that neither improving data quality nor introducing robust image encoders improves generation quality under adversarial noise, leaving a gap for future work to investigate effective defenses.

🎯 Key Contributions

Comprehensive $\epsilon$ Analysis: We provide qualitative and quantitative analysis showing that perturbations as small as $\epsilon = 8/255$ can produce highly dissimilar outputs while maintaining visual plausibility (PSNR > 30 dB).
Cross-Architecture Evaluation: We test VAE-based encoders (InstructPix2Pix, InstructCLIP-Pix2Pix) and CLIP-based encoders (Kandinsky 2.2), revealing that both are vulnerable to query-free attacks.
Defense Evaluation: We demonstrate that current defense mechanisms—including training on CLIP-filtered high-quality data and replacing encoders with adversarially robust versions (RobustCLIP)—fail to provide robustness, leaving an open problem for future research.
Attack Methodology: We employ Auto-PGD (APGD), an adaptive first-order attack that eliminates manual hyperparameter tuning while achieving 6% better effectiveness than vanilla PGD at $\epsilon = 16/255$.

🧠 Evaluated Models

Model	Architecture	Encoder Type	Paper
InstructPix2Pix	Stable Diffusion	VAE	Brooks et al., 2023
InstructCLIP-Pix2Pix	Stable Diffusion + LoRA	VAE (CLIP-filtered data)	Chen et al., 2025
Kandinsky 2.2	Latent Diffusion	CLIP ViT-L/14	Razzhigaev et al., 2023

🛡️ Evaluated Defense Mechanisms

Defense Strategy	Description	Effectiveness	Reference
CLIP-Filtered Training Data	Train on contrastively curated dataset (InstructCLIP-Pix2Pix)	❌ No improvement under adversarial noise	Chen et al., 2025
RobustCLIP Encoder	Replace Kandinsky's CLIP encoder with adversarially fine-tuned RobustCLIP	❌ No difference observed	Schlarmann et al., 2024

Key Finding: Our experiments show that neither improved training data quality nor robust encoders provide effective defense against query-free attacks, highlighting an important open problem for future research.

🔬 Attack Methodology

Query-Free Optimization

Given an image encoder $f: \mathbb{R}^d \to \mathbb{R}^k$ (e.g., CLIP or VAE), we optimize:

$$ \max_{|\boldsymbol\delta|_{\infty} \leq \epsilon} \mathcal{D}\big(f(\mathbf{x}), f(\mathbf{x} + \boldsymbol\delta)\big), \quad \text{s.t.} \quad \mathbf{x} + \boldsymbol\delta \in [0,1]^d $$

where $\mathcal{D}$ is either:

Euclidean distance: $\mathcal{D}(\mathbf{z}_1, \mathbf{z}_2) = |\mathbf{z}_1 - \mathbf{z}_2|_2$ (effective for VAE encoders)
Cosine similarity: $\mathcal{D}(\mathbf{z}_1, \mathbf{z}_2) = 1 - \frac{\mathbf{z}_1 \cdot \mathbf{z}_2}{|\mathbf{z}_1| |\mathbf{z}_2|}$ (effective for CLIP encoders)

Auto-PGD (APGD) Update Rule

The attack iteratively updates the perturbation using:

$$ \boldsymbol\delta_{t+1} = \Pi_{|\cdot|_{\infty} \leq \epsilon}\Big(\boldsymbol\delta_t + \alpha_t \cdot \text{sign}\big(\nabla_{\boldsymbol\delta_t} \mathcal{D}(f(\mathbf{x}), f(\mathbf{x} + \boldsymbol\delta_t))\big)\Big) $$

where $\Pi$ is the projection operator and $\alpha_t$ is adaptively scheduled by APGD based on the loss landscape, eliminating manual step-size tuning.

Key Advantages over PGD:

Automatic step-size adaptation
Momentum accumulation for stability
Checkpoint rollback on overshooting
6% relative improvement in attack effectiveness at $\epsilon = 16/255$

📦 Installation

We use uv for fast, reproducible dependency management.

# Install uv (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh

# Clone the repository
git clone https://github.com/MarioRicoIbanez/AdversarialML-I2I.git
cd AdversarialML-I2I

# Create virtual environment and install dependencies
uv sync

# Activate the environment
source .venv/bin/activate  # Linux/Mac
# OR
.venv\Scripts\activate  # Windows

Manual Installation (pip):

pip install -e .

🚀 Quick Start

Test All Models

Run a quick sanity check on all supported models:

CUDA_VISIBLE_DEVICES=0 uv run python test_all_models.py

This will:

Load each model (InstructPix2Pix, InstructCLIP, Kandinsky)
Run a 3-step APGD attack with $\epsilon = 16/255$
Generate adversarial outputs
Report success/failure for each model

Visual Experiment: Multi-$\epsilon$ Analysis

Generate side-by-side comparisons across multiple perturbation budgets:

CUDA_VISIBLE_DEVICES=0 uv run python visual_experiment.py

Output Structure:

experiments_output/visual_test/
├── pix2pix_Distance_eps8_sample0.png       # Grid: Original | Adversarial | Generated
├── pix2pix_Distance_eps16_sample0.png
├── pix2pix_Distance_eps32_sample0.png
├── pix2pix_Distance_eps64_sample0.png
├── kandinsky_Similarity_eps8_sample0.png
├── ...
├── individual/                              # Individual images per attack
│   ├── pix2pix_Distance_eps16_sample0/
│   │   ├── 1_original.png
│   │   ├── 2_adversarial.png
│   │   └── 3_generated.png
└── summary.txt                              # Quantitative results (CLIP similarity, L2 distance)

Configuration:

Models: pix2pix, pix2pix-lora, kandinsky
Loss Functions: Distance (Euclidean), Similarity (Cosine)
$\epsilon$ Values: [8, 16, 32, 64]/255
Attack Parameters: 10 iterations, $\alpha = 0.1$

📊 Key Results

Effect of Perturbation Budget $\epsilon$

$\epsilon$	PSNR (dB)	CLIP Sim (Orig→Gen)	CLIP Sim (Prompt→Gen)	Attack Visibility
1/255	50.8	0.883 ± 0.101	0.248 ± 0.036	Imperceptible
8/255	34.0	0.812 ± 0.113	0.256 ± 0.034	Effective threshold
16/255	27.9	0.744 ± 0.119	0.263 ± 0.030	Slight artifacts
32/255	22.0	0.677 ± 0.116	0.266 ± 0.028	Visible distortion
64/255	16.8	0.631 ± 0.109	0.267 ± 0.027	Severe corruption

Insight: $\epsilon \in [8/255, 16/255]$ represents the sweet spot—strong semantic drift while maintaining visual plausibility (PSNR > 27 dB).

Defense Mechanisms: Do They Work?

Defense Strategy	Model	CLIP Sim @ $\epsilon = 16/255$	Effective?
Baseline	InstructPix2Pix	0.744 ± 0.119	—
High-Quality Data	InstructCLIP-Pix2Pix	0.743 ± 0.110	❌ No improvement
Robust Encoder	Kandinsky + RobustCLIP	0.708 ± 0.105	❌ No improvement

Conclusion: Current defense mechanisms provide no measurable robustness against query-free encoder attacks, highlighting an important open research problem.

📂 Repository Structure

AdversarialML-I2I/
├── src/adversarial_i2i/          # Core attack library
│   ├── attacks/
│   │   ├── apgd.py               # Auto-PGD implementation
│   │   └── pgd.py                # Vanilla PGD baseline
│   ├── models/
│   │   └── wrappers.py           # Model encoder wrappers (VAE, CLIP, etc.)
│   ├── evaluation/
│   │   └── metrics.py            # CLIP similarity, PSNR, etc.
│   └── utils/
│       ├── data.py               # Dataset loading utilities
│       └── image.py              # Image preprocessing/postprocessing
├── test_all_models.py            # Sanity check script
├── visual_experiment.py          # Multi-epsilon visual analysis
├── assets/
│   └── 2025_Rico_AttacksI2I.pdf       # Full paper
├── pyproject.toml                # Project metadata + dependencies
├── uv.lock                       # Dependency lock file
└── README.md                     # This file

🔧 Advanced Usage

Custom Attack Configuration

from src.adversarial_i2i.models import load_model
from src.adversarial_i2i.attacks import apgd_attack
from torchvision.transforms.functional import to_pil_image

# Load model
model = load_model("pix2pix")

# Preprocess image
image_tensor = model.preprocess(pil_image)  # Shape: [1, 3, H, W]

# Run APGD attack
adversarial = apgd_attack(
    encoder=model,
    image=image_tensor,
    batch_size=1,
    pixel_change=16,        # epsilon = 16/255
    epochs=100,             # Attack iterations
    alpha=0.1,              # Initial step size (auto-adapted)
    loss_type="Distance",   # "Distance" (L2) or "Similarity" (cosine)
    verbose=True
)

# Generate with adversarial input
adversarial_pil = to_pil_image(adversarial[0])
output = model.pipe(
    prompt=["Turn it into a photo"],
    image=[adversarial_pil],
    num_inference_steps=50,
    image_guidance_scale=1.5,
    guidance_scale=7.5
).images[0]

📈 Evaluation Metrics

We evaluate attacks using multiple complementary metrics:

Metric	Description	Interpretation
CLIP Similarity (Orig→Gen)	Cosine similarity between original and generated image embeddings	Lower = stronger attack
CLIP Similarity (Prompt→Gen)	Alignment between text prompt and generated output	Should remain high (instruction following)
CLIP Similarity (Orig→Adv)	Perceptual similarity of adversarial perturbation	Higher = stealthier attack
PSNR (dB)	Peak Signal-to-Noise Ratio between original and adversarial image	Higher = less visible distortion
L2 Distance	Euclidean distance in latent space	Measures encoder displacement

🤝 Acknowledgements

This work was conducted at the Laboratory for Information and Inference Systems (LIONS) at EPFL, Switzerland.

Authors:

Mario Rico Ibáñez – Master's student in Computer Science at EPFL (mario.ricoibanez@epfl.ch)
Elias Abad Rocamora – PhD student at LIONS, EPFL
Prof. Volkan Cevher – Director of LIONS Lab, EPFL

Laboratory: LIONS – Laboratory for Information and Inference Systems

📄 License

This project is licensed under the MIT License.

For academic use only. Commercial applications require explicit permission.

💬 Contact

For questions, issues, or collaboration inquiries:

Open an issue: GitHub Issues
Email: mario.ricoibanez@epfl.ch
Lab Website: LIONS @ EPFL

⚠️ Responsible Disclosure: This research is intended to improve the robustness of generative AI systems. Please use this code ethically and responsibly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Query-Free Adversarial Attacks on Image-to-Image Models

📄 Full Paper

🎯 Key Contributions

🧠 Evaluated Models

🛡️ Evaluated Defense Mechanisms

🔬 Attack Methodology

Query-Free Optimization

Auto-PGD (APGD) Update Rule

📦 Installation

🚀 Quick Start

Test All Models

Visual Experiment: Multi-$\epsilon$ Analysis

📊 Key Results

Effect of Perturbation Budget $\epsilon$

Defense Mechanisms: Do They Work?

📂 Repository Structure

🔧 Advanced Usage

Custom Attack Configuration

📈 Evaluation Metrics

🤝 Acknowledgements

📄 License

💬 Contact

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
assets		assets
code		code
src/adversarial_i2i		src/adversarial_i2i
.gitignore		.gitignore
README.md		README.md
environment.yaml		environment.yaml
pyproject.toml		pyproject.toml
test_all_models.py		test_all_models.py
uv.lock		uv.lock
visual_experiment.py		visual_experiment.py

MarioRicoIbanez/AdversarialML-I2I

Folders and files

Latest commit

History

Repository files navigation

Query-Free Adversarial Attacks on Image-to-Image Models

📄 Full Paper

🎯 Key Contributions

🧠 Evaluated Models

🛡️ Evaluated Defense Mechanisms

🔬 Attack Methodology

Query-Free Optimization

Auto-PGD (APGD) Update Rule

📦 Installation

🚀 Quick Start

Test All Models

Visual Experiment: Multi-$\epsilon$ Analysis

📊 Key Results

Effect of Perturbation Budget $\epsilon$

Defense Mechanisms: Do They Work?

📂 Repository Structure

🔧 Advanced Usage

Custom Attack Configuration

📈 Evaluation Metrics

🤝 Acknowledgements

📄 License

💬 Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages