This repository implements query-free adversarial attacks on image-to-image diffusion models by perturbing publicly available image encoders. We demonstrate that small, imperceptible perturbations (
📑 Query-free Attacks on Image-to-image Models are Hard to Avoid (PDF)
Abstract: We evaluate query-free adversarial attacks targeting image-to-image generative models across different encoder architectures, specifically Variational Autoencoders (VAE) and CLIP image encoders. We investigate the perturbation budget (
-
Comprehensive
$\epsilon$ Analysis: We provide qualitative and quantitative analysis showing that perturbations as small as$\epsilon = 8/255$ can produce highly dissimilar outputs while maintaining visual plausibility (PSNR > 30 dB). -
Cross-Architecture Evaluation: We test VAE-based encoders (InstructPix2Pix, InstructCLIP-Pix2Pix) and CLIP-based encoders (Kandinsky 2.2), revealing that both are vulnerable to query-free attacks.
-
Defense Evaluation: We demonstrate that current defense mechanisms—including training on CLIP-filtered high-quality data and replacing encoders with adversarially robust versions (RobustCLIP)—fail to provide robustness, leaving an open problem for future research.
-
Attack Methodology: We employ Auto-PGD (APGD), an adaptive first-order attack that eliminates manual hyperparameter tuning while achieving 6% better effectiveness than vanilla PGD at
$\epsilon = 16/255$ .
| Model | Architecture | Encoder Type | Paper |
|---|---|---|---|
| InstructPix2Pix | Stable Diffusion | VAE | Brooks et al., 2023 |
| InstructCLIP-Pix2Pix | Stable Diffusion + LoRA | VAE (CLIP-filtered data) | Chen et al., 2025 |
| Kandinsky 2.2 | Latent Diffusion | CLIP ViT-L/14 | Razzhigaev et al., 2023 |
| Defense Strategy | Description | Effectiveness | Reference |
|---|---|---|---|
| CLIP-Filtered Training Data | Train on contrastively curated dataset (InstructCLIP-Pix2Pix) | ❌ No improvement under adversarial noise | Chen et al., 2025 |
| RobustCLIP Encoder | Replace Kandinsky's CLIP encoder with adversarially fine-tuned RobustCLIP | ❌ No difference observed | Schlarmann et al., 2024 |
Key Finding: Our experiments show that neither improved training data quality nor robust encoders provide effective defense against query-free attacks, highlighting an important open problem for future research.
Given an image encoder
where
-
Euclidean distance:
$\mathcal{D}(\mathbf{z}_1, \mathbf{z}_2) = |\mathbf{z}_1 - \mathbf{z}_2|_2$ (effective for VAE encoders) -
Cosine similarity:
$\mathcal{D}(\mathbf{z}_1, \mathbf{z}_2) = 1 - \frac{\mathbf{z}_1 \cdot \mathbf{z}_2}{|\mathbf{z}_1| |\mathbf{z}_2|}$ (effective for CLIP encoders)
The attack iteratively updates the perturbation using:
where
Key Advantages over PGD:
- Automatic step-size adaptation
- Momentum accumulation for stability
- Checkpoint rollback on overshooting
-
6% relative improvement in attack effectiveness at
$\epsilon = 16/255$
We use uv for fast, reproducible dependency management.
# Install uv (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh
# Clone the repository
git clone https://github.com/MarioRicoIbanez/AdversarialML-I2I.git
cd AdversarialML-I2I
# Create virtual environment and install dependencies
uv sync
# Activate the environment
source .venv/bin/activate # Linux/Mac
# OR
.venv\Scripts\activate # WindowsManual Installation (pip):
pip install -e .Run a quick sanity check on all supported models:
CUDA_VISIBLE_DEVICES=0 uv run python test_all_models.pyThis will:
- Load each model (InstructPix2Pix, InstructCLIP, Kandinsky)
- Run a 3-step APGD attack with
$\epsilon = 16/255$ - Generate adversarial outputs
- Report success/failure for each model
Generate side-by-side comparisons across multiple perturbation budgets:
CUDA_VISIBLE_DEVICES=0 uv run python visual_experiment.pyOutput Structure:
experiments_output/visual_test/
├── pix2pix_Distance_eps8_sample0.png # Grid: Original | Adversarial | Generated
├── pix2pix_Distance_eps16_sample0.png
├── pix2pix_Distance_eps32_sample0.png
├── pix2pix_Distance_eps64_sample0.png
├── kandinsky_Similarity_eps8_sample0.png
├── ...
├── individual/ # Individual images per attack
│ ├── pix2pix_Distance_eps16_sample0/
│ │ ├── 1_original.png
│ │ ├── 2_adversarial.png
│ │ └── 3_generated.png
└── summary.txt # Quantitative results (CLIP similarity, L2 distance)
Configuration:
-
Models:
pix2pix,pix2pix-lora,kandinsky -
Loss Functions:
Distance(Euclidean),Similarity(Cosine) -
$\epsilon$ Values:[8, 16, 32, 64]/255 -
Attack Parameters: 10 iterations,
$\alpha = 0.1$
| PSNR (dB) | CLIP Sim (Orig→Gen) | CLIP Sim (Prompt→Gen) | Attack Visibility | |
|---|---|---|---|---|
| 1/255 | 50.8 | 0.883 ± 0.101 | 0.248 ± 0.036 | Imperceptible |
| 8/255 | 34.0 | 0.812 ± 0.113 | 0.256 ± 0.034 | Effective threshold |
| 16/255 | 27.9 | 0.744 ± 0.119 | 0.263 ± 0.030 | Slight artifacts |
| 32/255 | 22.0 | 0.677 ± 0.116 | 0.266 ± 0.028 | Visible distortion |
| 64/255 | 16.8 | 0.631 ± 0.109 | 0.267 ± 0.027 | Severe corruption |
Insight:
| Defense Strategy | Model | CLIP Sim @ |
Effective? |
|---|---|---|---|
| Baseline | InstructPix2Pix | 0.744 ± 0.119 | — |
| High-Quality Data | InstructCLIP-Pix2Pix | 0.743 ± 0.110 | ❌ No improvement |
| Robust Encoder | Kandinsky + RobustCLIP | 0.708 ± 0.105 | ❌ No improvement |
Conclusion: Current defense mechanisms provide no measurable robustness against query-free encoder attacks, highlighting an important open research problem.
AdversarialML-I2I/
├── src/adversarial_i2i/ # Core attack library
│ ├── attacks/
│ │ ├── apgd.py # Auto-PGD implementation
│ │ └── pgd.py # Vanilla PGD baseline
│ ├── models/
│ │ └── wrappers.py # Model encoder wrappers (VAE, CLIP, etc.)
│ ├── evaluation/
│ │ └── metrics.py # CLIP similarity, PSNR, etc.
│ └── utils/
│ ├── data.py # Dataset loading utilities
│ └── image.py # Image preprocessing/postprocessing
├── test_all_models.py # Sanity check script
├── visual_experiment.py # Multi-epsilon visual analysis
├── assets/
│ └── 2025_Rico_AttacksI2I.pdf # Full paper
├── pyproject.toml # Project metadata + dependencies
├── uv.lock # Dependency lock file
└── README.md # This file
from src.adversarial_i2i.models import load_model
from src.adversarial_i2i.attacks import apgd_attack
from torchvision.transforms.functional import to_pil_image
# Load model
model = load_model("pix2pix")
# Preprocess image
image_tensor = model.preprocess(pil_image) # Shape: [1, 3, H, W]
# Run APGD attack
adversarial = apgd_attack(
encoder=model,
image=image_tensor,
batch_size=1,
pixel_change=16, # epsilon = 16/255
epochs=100, # Attack iterations
alpha=0.1, # Initial step size (auto-adapted)
loss_type="Distance", # "Distance" (L2) or "Similarity" (cosine)
verbose=True
)
# Generate with adversarial input
adversarial_pil = to_pil_image(adversarial[0])
output = model.pipe(
prompt=["Turn it into a photo"],
image=[adversarial_pil],
num_inference_steps=50,
image_guidance_scale=1.5,
guidance_scale=7.5
).images[0]We evaluate attacks using multiple complementary metrics:
| Metric | Description | Interpretation |
|---|---|---|
| CLIP Similarity (Orig→Gen) | Cosine similarity between original and generated image embeddings | Lower = stronger attack |
| CLIP Similarity (Prompt→Gen) | Alignment between text prompt and generated output | Should remain high (instruction following) |
| CLIP Similarity (Orig→Adv) | Perceptual similarity of adversarial perturbation | Higher = stealthier attack |
| PSNR (dB) | Peak Signal-to-Noise Ratio between original and adversarial image | Higher = less visible distortion |
| L2 Distance | Euclidean distance in latent space | Measures encoder displacement |
This work was conducted at the Laboratory for Information and Inference Systems (LIONS) at EPFL, Switzerland.
Authors:
- Mario Rico Ibáñez – Master's student in Computer Science at EPFL (mario.ricoibanez@epfl.ch)
- Elias Abad Rocamora – PhD student at LIONS, EPFL
- Prof. Volkan Cevher – Director of LIONS Lab, EPFL
Laboratory: LIONS – Laboratory for Information and Inference Systems
This project is licensed under the MIT License.
For academic use only. Commercial applications require explicit permission.
For questions, issues, or collaboration inquiries:
- Open an issue: GitHub Issues
- Email: mario.ricoibanez@epfl.ch
- Lab Website: LIONS @ EPFL