Adversarial Attacks on CIFAR-10

This project implements a framework for generating and evaluating transferable adversarial attacks on the CIFAR-10 dataset. It focuses on training a U-Net based generator to produce perturbations that can fool a "black-box" victim model by leveraging distillation and advanced transferability techniques.

Methodology

The pipeline consists of three main stages:

1. Distillation

Knowledge is extracted from a pre-trained Victim Model (ResNet-50) and distilled into two Surrogate Models (ResNet-18 and VGG-16). This is achieved using soft distillation (KL Divergence loss) on a subset of the CIFAR-10 training data. These surrogates act as white-box proxies for the generator to attack.

2. Generator Training

A U-Net Generator is trained to produce adversarial perturbations. To ensure these perturbations generalize to the unseen victim model, the training incorporates:

Input Diversity: Randomly resizing and padding inputs (DIM) to prevent overfitting to specific pixel geometries.
Ghost Networks: Maintaining dropout layers in "train" mode during the backward pass to simulate an ensemble of network architectures.
C&W Margin Loss: Optimizing the margin between the correct class and the most likely wrong class.
Norm Regulation: A MSE loss to keep the perturbations subtle.

3. Evaluation

The generator's performance is compared against a PGD (Projected Gradient Descent) baseline. The evaluation measures:

Attack Success Rate (ASR): The percentage of images misclassified by the victim after perturbation.
Inference Speed: The time required to generate an adversarial batch (Single forward pass vs. iterative optimization).
Visual Fidelity: Comparing clean, Generator-produced, and PGD-produced images.

Project Structure

models.py: Definitions for the Victim, Surrogates, and U-Net Generator.
distill.py: Script for training the victim and distilling knowledge into surrogates.
train_generator.py: Implementation of the generator training loop with transferability enhancements.
attack_eval.py: Evaluation script providing metrics and visual comparisons.
victim.pth, generator_best.pth: Model weights (not included in repository).

Results

The Generator provides a significant speedup over iterative attacks like PGD while maintaining a high success rate against the black-box victim.

Performance Comparison (Epsilon = 20/255)

Attack Method	Success Rate	Avg. Time per Batch
PGD (10 Step)	100%	0.45s
Generator	68%	0.0014s

Visual Comparison

Representative adversarial examples generated by this framework:

Usage

Train/Distill Models:
```
python distill.py
```
Train Generator:
```
python train_generator.py
```
Evaluate:
```
python attack_eval.py
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adversarial Attacks on CIFAR-10

Methodology

1. Distillation

2. Generator Training

3. Evaluation

Project Structure

Results

Performance Comparison (Epsilon = 20/255)

Visual Comparison

Usage

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Adversarial Attacks on CIFAR-10

Methodology

1. Distillation

2. Generator Training

3. Evaluation

Project Structure

Results

Performance Comparison (Epsilon = 20/255)

Visual Comparison

Usage