This repository contains the source code corresponding to our paper titled "Approximate Kernel Density Estimation under Metric-based Local Differential Privacy". All algorithms are implemented in Python 3.
-
small_datasets/: Provides sampled example datasets for the CodRNA, CovType, RCV1, Yelp, and SYN (default) datasets mentioned in the paper. -
generate_syn_datasets.py: Contains code to generate a series of SYN datasets.
RACE.py,DM.py,PM.py,SW.py,GI.py,FKM_LL_RACE.py,FKM_LR_RACE.pyandmLDP_KDE.py: Store the code for algorithms RACE, DM, PM, SW, GI, FKM-LL-RACE, FKM-LR-RACE and mLDP-KDE, respectively.examples.py: Provides an example of the mLDP-KDE algorithm with 1000 construction points and 1 query point.
Python scripts are provided to run all experiments.
-
test_epsilon_utility_efficiency.py: Contains code for Expt1 - Utility vs. Privacy and Expt2 - Time Efficiency. -
test_epsilon_sketchsize_communication.pyandtest_sketchsize_MSE.py: Contain code for Expt3 - Sketch Size and Communication Cost. -
test_m_utility_efficiency.pyandtest_n_utility_efficiency.py: Contain code for Expt4 - Scalability. -
test_epsilon_utility_L1.pyandtest_epsilon_utility_Angular.py: Contain code for Expt5 - Performance on Other LSH Kernels. - To reproduce results:
- Run any of the experimental scripts simply by using
python scripts_name.py. - In scripts requiring dataset selection, modify the value of
selected_flagto generate results for the chosen dataset. - In scripts requiring privacy radius
$r$ selection, modify the value ofnearest_flagandL_R_setto generate results for the chosen$r$ .
- Run any of the experimental scripts simply by using
plotting_tools.py: Provided for visualization. It requires Python 3.7 (or higher versions) and Matplotlib.
-
Additional Experiments:
-
test_small_range_epsilon_utility.py: Contains code to test the utility of the RACE, GI and mLDP-KDE algorithms using a smaller$\varepsilon$ range: [1, 2.5, 5, 7.5, 10, 12.5, 15, 17.5, 20]. -
test_r_effect.py: Contains code to evaluate the impact of a range of privacy radius$r$ on the mLDP-KDE algorithm. We tested two classes of$r$ :- average distance from a point to its t-nearest neighbors for t ∈ {1, 10, 100, 1000, 10000};
- maximum distance from a point to its t-nearest neighbors for t ∈ {1, 10, 100, 1000, 10000};
-
heatmap_visualization.py: Contains code to visualizing 2D Heatmaps for KDE on each dataset.
-
- Update
mLDP_KDE.py: Added comments to enhance code clarity. - Update
parameters.py: Added a set of parameters for new experiments. - Update
plotting_tools.py: Added functionsdraw_heatmapanddraw_small_range_epsilon_MSEfor new experiments.
- Update the repository for Camera-ready.