🚀 Outlier Side-Channel Attacks (IBM Cloudstars)

📌 Overview

This project aims to analyze how likely a sample is to be used as an attack vector for membership inference attacks without requiring a full training process.

🔍 Key Questions:

Are outliers used in the training process of the model?
Can we determine if outliers are included in the training dataset?
Can we identify whether a sample is an outlier?
Can side-channel data provide useful signals about training dynamics?

Outliers in training can have both positive and negative effects:

✅ Benefits: Reduced overfitting, better generalization, and often highly descriptive samples.
⚠️ Risks: Higher computational costs and increased vulnerability to membership inference attacks due to distinguishable processing patterns.

🧪 One hypothesis is that outliers may introduce higher delta gradients, leading to increased energy consumption. By analyzing side-channel data, we may determine if a model is being trained with outliers.

🏃‍♂️ How to run:

I'm assuming that in the src directory there is a run.lsf file which looks something like this:

jbsub -q x86_1h -cores 1+1 -mem 20g python $@

Then running seq to seq finetuning for the albert model can be run like this:

./run.lsf IBM/lightning_routine.py IBM/c_albert.yaml

The table below shows the major options for the yaml file:

Name	Options
run -> optimizer	"adam"
run -> criterion	"CrossEntropyLoss", "MSELoss"
model -> type	"albert", "bert", "granite", "swin", ...
dataset -> type	"med", "imdb", "mnist", "cifar100", ...
dataset -> samples_per_class	2000 or 7000 usually
dataset -> classes	3 or 10 usually

while this table shows the resource configs used for all runs:

Model	Dataset	GPUs	Mem	Hrs
SwinT-V2	CIFAR	2	50	6
All other vision	All	1	50	1
Smaller than Bert	Imdb	1	90	1
Bert and larger	Med	2-3-4	100+	6-12
Granite and larger	Fineweb	?	?	?

📖 Methodology

The methodology involves training a model while tracking various metrics related to energy consumption and computational performance. These metrics will be correlated to the presence of outliers in training batches.

📌 Steps:

Training Phase:
- Train a model while tracking specific side-channel metrics.
- Introduce labeled outliers to observe their impact.
📊 Metric Tracking:
- 📉 Training loss
- ⚡ GPU power usage during the training step
- 🔋 GPU energy consumption per training step
- ⏳ Training step duration
- 🔄 Backpropagation time
🕵️ Outlier Detection:
- Apply multiple statistical and clustering methods to detect outliers in batches.
- Investigate the correlation between outlier presence and tracked metrics.
📈 Analysis & Evaluation:
- Compare side-channel information with the presence of known outliers.
- Cross-reference indices across epochs (since batches are shuffled) to identify persistent outliers.
- Determine if outliers impact specific computational aspects such as processing time and energy usage.

🔍 Outlier Detection Techniques (Batch-Level)

To detect outliers at the batch level, we use multiple techniques:

📏 Mahalanobis Distance: Measures how far a sample is from the mean, considering covariance.
🔎 Local Outlier Factor (LOF): Identifies density-based anomalies by comparing local densities.
🧩 DBSCAN (Density-Based Spatial Clustering): Detects clusters and identifies points in low-density regions as outliers.
📉 Weighted Least-Frequent (WLF) Detection: Identifies samples that appear infrequently across batches.
📊 Z-score: Standard statistical approach for measuring how many standard deviations a sample deviates from the mean.

🎯 Outlier Identification (Within a Batch)

Once a batch is identified as outlier-heavy, we analyze individual samples:

🔄 Cross-reference indices across multiple epochs (since batches are shuffled each epoch).
🔍 Identify recurring outlier samples that consistently appear in flagged batches.
📊 Examine the impact of these samples on training metrics.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
configs		configs
detection		detection
identification		identification
local_datasets		local_datasets
models		models
scripts		scripts
utils		utils
README.md		README.md
c_albert.yaml		c_albert.yaml
c_bert.yaml		c_bert.yaml
c_granite.yaml		c_granite.yaml
c_llama.yaml		c_llama.yaml
c_lstm.yaml		c_lstm.yaml
c_mistral.yaml		c_mistral.yaml
c_mnist.yaml		c_mnist.yaml
c_samples.yaml		c_samples.yaml
c_swin.yaml		c_swin.yaml
c_tinyvit.yaml		c_tinyvit.yaml
check_metrics.py		check_metrics.py
lightning_routine.py		lightning_routine.py
pca.py		pca.py
requirements.txt		requirements.txt
routine.py		routine.py
torch_distributed_helper.py		torch_distributed_helper.py
torch_routine.py		torch_routine.py
visual.py		visual.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🚀 Outlier Side-Channel Attacks (IBM Cloudstars)

📌 Overview

🔍 Key Questions:

🏃‍♂️ How to run:

📖 Methodology

📌 Steps:

🔍 Outlier Detection Techniques (Batch-Level)

🎯 Outlier Identification (Within a Batch)

About

Uh oh!

Releases

Packages

Uh oh!

Languages

HPCI-Lab/Outlier_Identification

Folders and files

Latest commit

History

Repository files navigation

🚀 Outlier Side-Channel Attacks (IBM Cloudstars)

📌 Overview

🔍 Key Questions:

🏃‍♂️ How to run:

📖 Methodology

📌 Steps:

🔍 Outlier Detection Techniques (Batch-Level)

🎯 Outlier Identification (Within a Batch)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages