fix: make prv accountant robust to larger epsilons (#606)

Solosneros · facebook-github-bot · commit ad084da9e46b · 2023-11-28T08:14:52.000-08:00
Summary: ## Types of changes - [x] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Docs change / refactoring / dependency upgrade ## Motivation and Context / Related issue Hi, this PR fixes #601 and #604. It will introduce the same fix as in microsoft/prv_accountant#38. Lukas (author of prv accountant, wulu473) said that `In general, adding any additional points is safe and won't affect the robustness negatively.` The cause of these errors seems to be the grid for computing the `mean()` function of the `PrivacyRandomVariableTruncated` class. The grid (`points` variable) used to compute the mean is constant apart from the lowest (`self.t_min`) and highest point (`self.t_max`). This PR determines the grid (`points` variable) based on the lowest and highest point. More information is below. Best **Observation** I debugged the code and arrived at some point at the `mean()` function of the `PrivacyRandomVariableTruncated` class. The grid (`points` variable) used to compute the mean is constant apart from the lowest (`self.t_min`) and highest point (`self.t_max`). See the line of code [here](https://github.com/microsoft/prv_accountant/blob/a95c4e2d41ff4886c3e4a84925edf878a6540e0a/prv_accountant/privacy_random_variables/abstract_privacy_random_variable.py#L52). It looks like this `[self.tmin, -0.1, -0.01, -0.001, -0.0001, -1e-05, 1e-05, 0.0001, 0.001, 0.01, 0.1, self.tmax]`. It seems that the `tmin` and `tmax` are of the order of `[-12,12]` for the examples that I posted above and even up to `[-48,48]` for the example that jeandut posted in the #604 issue whereas they are more like `[-7,7]` for the [readme example for DP-SGD](https://github.com/microsoft/prv_accountant#dp-sgd). We suspect that the integration breaks down when the gridspacing between between `tmin` / `tmax` get's too large. **Proposed solution** Determine the points grid based on `tmin` and `tmax` but determines the start and end of the logspace based on `tmin` and `tmax`. Before: (https://github.com/pytorch/opacus/blob/95df0904ae5d2b3aaa26b708e5067e9271624036/opacus/accountants/analysis/prv/prvs.py#L99-L106) After: ``` # determine points based on t_min and t_max lower_exponent = int(np.log10(np.abs(self.t_min))) upper_exponent = int(np.log10(self.t_max)) points = np.concatenate( [ [self.t_min], -np.logspace(start=lower_exponent, stop=-5, num=10), [0], np.logspace(start=-5, stop=upper_exponent, num=10), [self.t_max], ] ) ``` ## How Has This Been Tested (if it applies) I ran the examples from the issues #601 and #604 and they don't break anymore. ``` import opacus target_delta = 0.001 target_epsilon = 20 steps = 5000 sample_rate=0.19120458891013384 for target_epsilon in [20, 50]: noise_multiplier = opacus.privacy_engine.get_noise_multiplier(target_delta=target_delta, target_epsilon=target_epsilon, steps=steps, sample_rate=sample_rate, accountant="prv") prv_accountant = opacus.accountants.utils.create_accountant("prv") prv_accountant.history = [(noise_multiplier, sample_rate, steps)] obtained_epsilon = prv_accountant.get_epsilon(delta=target_delta) print(f"target epsilon {target_epsilon}, obtained epsilon {obtained_epsilon}") ``` > target epsilon 20, obtained epsilon 19.999332284974717 target epsilon 50, obtained epsilon 49.99460075990896 ``` target_epsilon = 4 batch_size = 50 epochs = 5 delta = 1e-05 expected_len_dataloader = 500 // batch_size sample_rate = 1/expected_len_dataloader noise_multiplier = opacus.privacy_engine.get_noise_multiplier(target_delta=target_delta, target_epsilon=target_epsilon, epochs=epochs, sample_rate=sample_rate, accountant="prv") prv_accountant = opacus.accountants.utils.create_accountant("prv") prv_accountant.history = [(noise_multiplier, sample_rate, int(epochs / sample_rate))] obtained_epsilon = prv_accountant.get_epsilon(delta=target_delta) print(f"target epsilon {target_epsilon}, obtained epsilon {obtained_epsilon}") ``` > target epsilon 4, obtained epsilon 3.9968389923130356 ## Checklist - [x] The documentation is up-to-date with the changes I made. - [x] I have read the **CONTRIBUTING** document and completed the CLA (see **CONTRIBUTING**). - [ ] All tests passed, and additional code has been covered with new tests. Not able to run all tests locally and unsure if new tests should be added. Pull Request resolved: #606 Reviewed By: HuanyuZhang Differential Revision: D50111887 fbshipit-source-id: 2f77f8bc0e59837f765b87f2e107bc01015b9481
diff --git a/opacus/accountants/analysis/prv/prvs.py b/opacus/accountants/analysis/prv/prvs.py
@@ -96,11 +96,15 @@ def mean(self) -> float:
         """
         Calculate the mean using numerical integration.
         """
+        # determine points based on t_min and t_max
+        lower_exponent = int(np.log10(np.abs(self.t_min)))
+        upper_exponent = int(np.log10(self.t_max))
         points = np.concatenate(
             [
                 [self.t_min],
-                -np.logspace(-5, -1, 5)[::-1],
-                np.logspace(-5, -1, 5),
+                -np.logspace(start=lower_exponent, stop=-5, num=10),
+                [0],
+                np.logspace(start=-5, stop=upper_exponent, num=10),
                 [self.t_max],
             ]
         )

Original file line number	Diff line number	Diff line change
`@@ -96,11 +96,15 @@ def mean(self) -> float:`
`96`	`96`	`"""`
`97`	`97`	`Calculate the mean using numerical integration.`
`98`	`98`	`"""`
	`99`	`+ # determine points based on t_min and t_max`
	`100`	`+ lower_exponent = int(np.log10(np.abs(self.t_min)))`
	`101`	`+ upper_exponent = int(np.log10(self.t_max))`
`99`	`102`	`points = np.concatenate(`
`100`	`103`	`[`
`101`	`104`	`[self.t_min],`
`102`		`- -np.logspace(-5, -1, 5)[::-1],`
`103`		`- np.logspace(-5, -1, 5),`
	`105`	`+ -np.logspace(start=lower_exponent, stop=-5, num=10),`
	`106`	`+ [0],`
	`107`	`+ np.logspace(start=-5, stop=upper_exponent, num=10),`
`104`	`108`	`[self.t_max],`
`105`	`109`	`]`
`106`	`110`	`)`