Skip to content

Latest commit

 

History

History
61 lines (46 loc) · 2.99 KB

File metadata and controls

61 lines (46 loc) · 2.99 KB

Adversarial Attacks on CIFAR-10

This project implements a framework for generating and evaluating transferable adversarial attacks on the CIFAR-10 dataset. It focuses on training a U-Net based generator to produce perturbations that can fool a "black-box" victim model by leveraging distillation and advanced transferability techniques.

Methodology

The pipeline consists of three main stages:

1. Distillation

Knowledge is extracted from a pre-trained Victim Model (ResNet-50) and distilled into two Surrogate Models (ResNet-18 and VGG-16). This is achieved using soft distillation (KL Divergence loss) on a subset of the CIFAR-10 training data. These surrogates act as white-box proxies for the generator to attack.

2. Generator Training

A U-Net Generator is trained to produce adversarial perturbations. To ensure these perturbations generalize to the unseen victim model, the training incorporates:

  • Input Diversity: Randomly resizing and padding inputs (DIM) to prevent overfitting to specific pixel geometries.
  • Ghost Networks: Maintaining dropout layers in "train" mode during the backward pass to simulate an ensemble of network architectures.
  • C&W Margin Loss: Optimizing the margin between the correct class and the most likely wrong class.
  • Norm Regulation: A MSE loss to keep the perturbations subtle.

3. Evaluation

The generator's performance is compared against a PGD (Projected Gradient Descent) baseline. The evaluation measures:

  • Attack Success Rate (ASR): The percentage of images misclassified by the victim after perturbation.
  • Inference Speed: The time required to generate an adversarial batch (Single forward pass vs. iterative optimization).
  • Visual Fidelity: Comparing clean, Generator-produced, and PGD-produced images.

Project Structure

  • models.py: Definitions for the Victim, Surrogates, and U-Net Generator.
  • distill.py: Script for training the victim and distilling knowledge into surrogates.
  • train_generator.py: Implementation of the generator training loop with transferability enhancements.
  • attack_eval.py: Evaluation script providing metrics and visual comparisons.
  • victim.pth, generator_best.pth: Model weights (not included in repository).

Results

The Generator provides a significant speedup over iterative attacks like PGD while maintaining a high success rate against the black-box victim.

Performance Comparison (Epsilon = 20/255)

Attack Method Success Rate Avg. Time per Batch
PGD (10 Step) 100% 0.45s
Generator 68% 0.0014s

Visual Comparison

Representative adversarial examples generated by this framework:

Attack Comparison

Usage

  1. Train/Distill Models:
    python distill.py
  2. Train Generator:
    python train_generator.py
  3. Evaluate:
    python attack_eval.py