A comprehensive analysis of CNN vulnerability to adversarial attacks and implementation of defensive countermeasures using PyTorch on the Caltech 101 dataset.
Purpose: Establish baseline CNN models through transfer learning for subsequent adversarial analysis.
Implementation: Fine-tunes pre-trained ResNet-34 and MobileNetV2 architectures on Caltech 101. Employs layer freezing - ResNet-34 trains only layer4 and the classifier, while MobileNetV2 trains the final 2 feature blocks and the classifier. Includes data augmentation, early stopping, and cosine annealing scheduling.
Outputs:
- Trained model weights (
models/ResNet34_best.pth,models/MobileNetV2_best.pth) - Training curves and performance metrics visualization
- Model comparison table with parameter counts and accuracies
- Validation indices file (
validation_indices.pkl) for reproducible splits - TensorBoard logs for training monitoring
Purpose: Systematically evaluate model robustness against Fast Gradient Sign Method (FGSM) attacks using torchattacks text.
Implementation: Tests 17 epsilon values (0.001-0.2) to identify minimal perturbation thresholds. Targets 80% error rate to simulate realistic attack scenarios. Generates adversarial examples and analyzes attack effectiveness across perturbation strengths.
Outputs:
- Attack effectiveness curves showing accuracy vs epsilon
- Adversarial example visualizations comparing clean vs perturbed images
- Attack results summary with error rates and robustness metrics
- Generated adversarial datasets (
logs/step_2/adversarial_datasets.pkl) - Comprehensive attack results log (
logs/step_2/step_2_adversarial_attacks.json)
Purpose: Analyze adversarial attack mechanisms through explainable AI techniques.
Implementation:I applied Grad-CAM and vanilla gradient saliency mapping - for saliency i tried to use XAITK but it was a headache to use so instead i opted for manually implementing vanilla gradient saliency mapping, anyway both methods were used to visualize attention patterns - to be accurate they're not exactly "attention" in the technical sense, but rather changes in which convolutional features activate most strongly for classification decisions. in clean vs adversarial examples.
Outputs:
- Grad-CAM heatmap visualizations for clean and adversarial examples
- Saliency map comparisons showing attention redistribution
- Forensic case studies with detailed attack mechanism analysis
- XAI visualization plots saved to
logs/forensic_analysis/ - Attack mechanism documentation and attention pattern analysis
Purpose: Implement and evaluate defensive strategies against adversarial attacks.
Implementation: Deploys three defense approaches:
- Adversarial training with curriculum learning (ε: 0.05→0.2)
- Input transformation defense (resize, JPEG compression, Gaussian noise)
- Combined defense integration. a note on this method, in the current notebook state this was not implement but the code for it is there. This is due to gradient computation issues during calculations by the FGSM due to the input image transformations technique. At the top of my head i think a fix is to manually augment the input image with torch tensors for gradient computations.
Includes TensorBoard logging and a comparative evaluation framework.
Outputs:
- Defended model weights with curriculum learning training
- Defense effectiveness comparison plots and metrics
- TensorBoard training
For a quick environment setup run:
conda env create -f environment.ymlTo run tensorboard write this:
tensorboard --logdir .