This framework provides a unified way to run and analyze federated learning experiments with different privacy mechanisms. It includes four main types of experiments:
- Local Baseline: Train separate models on each client's data without federation
- Federated Learning: Standard federated learning without privacy mechanisms
- Feature Suppression: Privacy through selective feature hiding in federated learning
- Differential Privacy: Privacy through noise addition in federated learning
- Python 3.8+
- PyTorch
- Opacus (for differential privacy)
- scikit-learn
- pandas
- numpy
- matplotlib
- seaborn
- pyyaml
- pyarrow
pip install torch scikit-learn pandas numpy matplotlib seaborn pyyaml opacus pyarrow
.
├── README.md # This file
├── run_experiment.py # Main entry point for experiments
├── utils/ # Utility modules
│ ├── config.py # Configuration handling
│ ├── data.py # Data preparation and loading
│ ├── models.py # Neural network models
│ └── experiment.py # Experiment runners
├── datasets/ # Directory for datasets (created automatically)
├── results/ # Experiment results (created automatically)
│ ├── iid/
│ │ ├── federated_experiments/ # Only intermediate federated results
│ │ ├── suppression_experiments/ # Only per-case suppression results
│ │ └── dp_experiments/ # Only per-case DP experiment results
│ ├── non-iid/
│ │ ├── federated_experiments/ # Only intermediate federated results
│ │ ├── suppression_experiments/ # Only per-case suppression results
│ │ └── dp_experiments/ # Only per-case DP experiment results
└── logs/ # Experiment logs (created automatically)
├── local_experiments/ # Logs for local experiments
├── federated_experiments/ # Logs for federated experiments
├── suppression_experiments/ # Logs for suppression experiments
└── dp_experiments/ # Logs for differential privacy experiments
- All final results are stored in the
results/<iid|non-iid>/
directory (including local results) - Federated learning: Final results in main
results/<iid|non-iid>/
directory and intermediate checkpoints inresults/federated_experiments/
- Suppression experiments: Final combined results in main
results/<iid|non-iid>/
directory (e.g.,suppression_final_*.json
) and per-case results inresults/<iid|non-iid>/suppression_experiments/
- Differential privacy: Final combined results in main
results/<iid|non-iid>/
directory (e.g.,dp_final_*.json
) and per-case results inresults/<iid|non-iid>/dp_experiments/
You can run experiments in different ways:
# Run a local baseline experiment
>>> python run_experiment.py --experiment-type local
python run_experiment.py --experiment-type local --seed 42
# Run a federated learning experiment
>>> python run_experiment.py --experiment-type federated
python run_experiment.py --experiment-type federated --federated-rounds 10 --client-epochs 10
# Run a suppression experiment
python run_experiment.py --experiment-type suppression
# Run a suppression experiment with limited parallel processes (for memory control)
>>> python run_experiment.py --experiment-type suppression --max-suppression-processes 10
# Run a differential privacy experiment with specific noise levels
python run_experiment.py --experiment-type differential_privacy
python run_experiment.py --experiment-type differential_privacy --noise-p1 0.5 --noise-p2 1.0
# Run DP experiments with limited parallel processes (to control memory usage)
>>> python run_experiment.py --experiment-type differential_privacy --max-dp-processes 6
First, create an example configuration file:
python run_experiment.py --create-config
Then, edit the configuration file (example_config.yaml
) to set your parameters. Run the experiment with:
python run_experiment.py --config example_config.yaml
To create visualizations and analyze experiment results:
>>> python run_experiment.py --analyze results
This will create heatmaps and other visualizations based on the available results in the specified directory.
Parameter | Description | Default |
---|---|---|
experiment_name | Name for the experiment | "fl_experiment" |
experiment_type | Type of experiment: "local", "federated", "suppression", "differential_privacy" | "local" |
data_path | Path to the main dataset | "datasets/dataset.parquet" |
p1_path | Path to Player 1's dataset (if pre-split) | None |
p2_path | Path to Player 2's dataset (if pre-split) | None |
feature_columns | List of feature column names | [multiple network features] |
target_column | Name of the target column | "application_name" |
seed | Random seed for reproducibility | 42 |
initial_split_ratio | Ratio for splitting into P1/P2 | 0.5 |
test_split_ratio | Ratio for splitting each part into train/test | 0.1667 |
batch_size | Batch size for training | 256 |
learning_rate | Learning rate | 0.001 |
local_epochs | Number of epochs for local baseline training | 100 |
federated_rounds | Number of federated learning rounds | 10 |
client_epochs | Number of epochs per client per federated round | 10 |
p1_features | Features to use for Player 1 (suppression) | None (all) |
p2_features | Features to use for Player 2 (suppression) | None (all) |
max_parallel_suppression_processes | Maximum number of parallel processes for suppression experiments | 2 |
noise_multiplier_p1 | Noise multiplier for Player 1 (DP) | None |
noise_multiplier_p2 | Noise multiplier for Player 2 (DP) | None |
max_grad_norm | Maximum gradient norm for DP | 1.0 |
max_parallel_dp_processes | Maximum number of parallel processes for DP experiments | 2 |
results_dir | Directory to store results | "results" |
logs_dir | Directory to store logs | "logs" |
- Split data between two players (P1 and P2) if not pre-split
- Train model M1 on P1's training data
- Train model M2 on P2's training data
- Evaluate both models on both players' test data
- Save results including training histories and evaluation metrics
- Split data between two players (P1 and P2) if not pre-split
- Initialize global model
- For each federated round:
- Distribute global model to both players
- Each player trains the model on their local data
- Aggregate updated models into a new global model
- Evaluate global model on both players' test data
- Save results for each round and final evaluation
- Split data between two players (P1 and P2) if not pre-split
- For each combination of feature sets:
- Create suppressed datasets for each player
- Run federated learning with these suppressed datasets
- Track performance across all feature combinations
- Generate heatmaps to visualize impact of suppression
- Split data between two players (P1 and P2) if not pre-split
- For each combination of noise levels:
- Run federated learning with DP-SGD at specified noise levels
- Track privacy guarantees (epsilon values) and model performance
- Control memory usage by limiting parallel processes
- Generate heatmaps to visualize privacy-utility tradeoff
The framework provides tools to analyze and visualize experiment results, including:
- Heatmaps for suppression experiments showing accuracy vs. feature counts
- Heatmaps for differential privacy experiments showing accuracy vs. noise levels
- Other visualizations and analysis functions can be added as needed
This project is open source and available under the MIT License.