Skip to content

Commit bbc5f68

Browse files
committed
Add controller-based implementation of privacy engine based in hooks (#6)
1 parent 7dbbb40 commit bbc5f68

File tree

12 files changed

+3481
-32
lines changed

12 files changed

+3481
-32
lines changed

opacus/__init__.py

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,13 +14,20 @@
1414
# limitations under the License.
1515

1616
from . import utils
17-
from .grad_sample import GradSampleModule, GradSampleModuleFastGradientClipping
17+
from .grad_sample import (
18+
GradSampleController,
19+
GradSampleModule,
20+
GradSampleModuleFastGradientClipping,
21+
)
1822
from .privacy_engine import PrivacyEngine
23+
from .privacy_engine_gsc import PrivacyEngineGradSampleController
1924
from .version import __version__
2025

2126

2227
__all__ = [
2328
"PrivacyEngine",
29+
"PrivacyEngineGradSampleController",
30+
"GradSampleController",
2431
"GradSampleModule",
2532
"GradSampleModuleFastGradientClipping",
2633
"utils",

opacus/grad_sample/README.md

Lines changed: 51 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -3,25 +3,41 @@
33
Computing per sample gradients is an integral part of Opacus framework. We strive to provide out-of-the-box support for
44
wide range of models, while keeping computations efficient.
55

6-
We currently provide two independent approaches for computing per sample gradients: hooks-based ``GradSampleModule``
7-
(stable implementation, exists since the very first version of Opacus) and ``GradSampleModuleExpandedWeights``
8-
(based on a beta functionality available in PyTorch 1.12).
6+
We currently provide three independent approaches for computing per sample gradients:
97

10-
Each of the two implementations comes with it's own set of limitations, and we leave the choice up to the client
11-
which one to use.
8+
1. **Hooks-based `GradSampleModule`** (stable, wraps the model)
9+
2. **`GradSampleController`** (stable, no model wrapping - recommended for transformers)
10+
3. **`GradSampleModuleExpandedWeights`** (beta, based on PyTorch 1.12+ functionality)
1211

13-
``GradSampleModuleExpandedWeights`` is currently in early beta and can produce unexpected errors, but potentially
14-
improves upon ``GradSampleModule`` on performance and functionality.
12+
Each implementation comes with its own set of limitations and benefits.
1513

16-
**TL;DR:** If you want stable implementation, use ``GradSampleModule`` (`grad_sample_mode="hooks"`).
17-
If you want to experiment with the new functionality, you have two options. Try
18-
``GradSampleModuleExpandedWeights``(`grad_sample_mode="ew"`) for better performance and `grad_sample_mode=functorch`
19-
if your model is not supported by ``GradSampleModule``.
14+
**TL;DR:**
15+
- Use `GradSampleModule` (`grad_sample_mode="hooks"`) for stable implementation with standard models
16+
- Use `GradSampleController` via `PrivacyEngineGradSampleController` for transformer models and when you need direct model access without wrapping
17+
- Use `GradSampleModuleExpandedWeights` (`grad_sample_mode="ew"`) if you want to experiment with better performance
18+
- Use `grad_sample_mode="functorch"` if your model has unsupported layers
2019

21-
Please switch back to ``GradSampleModule``(`grad_sample_mode="hooks"`) if you encounter strange errors or unexpexted behaviour.
22-
We'd also appreciate it if you report these to us
20+
Please report any strange errors or unexpected behaviour to us!
2321

24-
## Hooks-based approach
22+
## GradSampleController approach (No Model Wrapping)
23+
- Controller class: ``opacus.grad_sample.GradSampleController``
24+
- Privacy Engine: ``opacus.privacy_engine_gsc.PrivacyEngineGradSampleController``
25+
- Usage: Use `PrivacyEngineGradSampleController` instead of `PrivacyEngine`
26+
27+
**Recommended for transformer models and when model wrapping causes issues.**
28+
29+
Computes per-sample gradients by attaching hooks directly to model parameters without wrapping the model in a
30+
`GradSampleModule`. This approach:
31+
32+
- ✅ Preserves model type (e.g., `isinstance(model, BertModel)` remains `True`)
33+
- ✅ No `_module.` prefix in state_dict
34+
- ✅ Direct access to model attributes (no attribute forwarding needed)
35+
- ✅ Better compatibility with HuggingFace transformers and models with custom `__getattr__`
36+
- ✅ Same grad sampler methods as `GradSampleModule`
37+
38+
See [CONTROLLER_BASED_PRIVACY_ENGINE.md](../../docs/CONTROLLER_BASED_PRIVACY_ENGINE.md) for detailed documentation.
39+
40+
## Hooks-based approach (Model Wrapping)
2541
- Model wrapping class: ``opacus.grad_sample.grad_sample_module.GradSampleModule``
2642
- Keyword argument for ``PrivacyEngine.make_private()``: `grad_sample_mode="hooks"`
2743

@@ -62,23 +78,27 @@ is roughly the same.
6278
Please note that these are known limitations and we plan to improve Expanded Weights and bridge the gap in feature completeness
6379

6480

65-
| xxx | Hooks | Expanded Weights | Functorch |
66-
|:----------------------------:|:-------------------------------:|:----------------:|:------------:|
67-
| Required PyTorch version | 1.8+ | 1.13+ | 1.12 (to be updated) |
68-
| Development status | Underlying mechanism deprecated | Beta | Beta |
69-
| Runtime Performance† | baseline |~25% faster | 🟨 0-50% slower |
70-
| Any DP-allowed†† layers | Not supported | Not supported | ✅ Supported |
71-
| Most popular nn.* layers | ✅ Supported | ✅ Supported | ✅ Supported |
72-
| torchscripted models | Not supported | ✅ Supported | Not supported |
73-
| Client-provided grad sampler | ✅ Supported | Not supported | ✅ Not needed |
74-
| `batch_first=False` | ✅ Supported | Not supported | ✅ Supported |
75-
| Recurrent networks | ✅ Supported | Not supported | ✅ Supported |
76-
| Padding `same` in Conv | ✅ Supported | Not supported | ✅ Supported |
77-
| Empty poisson batches | ✅ Supported | Not supported | Not supported |
78-
79-
† Note, that performance differences are unstable and can vary a lot depending on the exact model and batch size.
80-
Numbers above are averaged over benchmarks with small models consisting of convolutional and linear layers.
81-
Note, that performance differences are only observed on GPU training, CPU performance seem to be almost identical
81+
| xxx | GradSampleModule (Hooks) | GradSampleController | Expanded Weights | Functorch |
82+
|:----------------------------:|:------------------------:|:-------------------:|:----------------:|:------------:|
83+
| Required PyTorch version | 1.8+ | 1.8+ | 1.13+ | 1.12 (to be updated) |
84+
| Development status | Deprecated mechanism | ✅ Stable | Beta | Beta |
85+
| Model wrapping | ✅ Wraps model | ✅ No wrapping | ✅ Wraps model | ✅ Wraps model |
86+
| Runtime Performance† | baseline | baseline |~25% faster | 🟨 0-50% slower |
87+
| Transformer compatibility | 🟨 May have issues | ✅ Excellent | 🟨 May have issues | 🟨 May have issues |
88+
| State dict compatibility | 🟨 `_module.` prefix | ✅ Clean keys | 🟨 `_module.` prefix | 🟨 `_module.` prefix |
89+
| Type preservation | ❌ Model wrapped | ✅ Model unchanged | ❌ Model wrapped | ❌ Model wrapped |
90+
| Any DP-allowed†† layers | Not supported | Not supported | Not supported | ✅ Supported |
91+
| Most popular nn.* layers | ✅ Supported | ✅ Supported | ✅ Supported | ✅ Supported |
92+
| torchscripted models | Not supported | Not supported | ✅ Supported | Not supported |
93+
| Client-provided grad sampler | ✅ Supported | ✅ Supported | Not supported | ✅ Not needed |
94+
| `batch_first=False` | ✅ Supported | ✅ Supported | Not supported | ✅ Supported |
95+
| Recurrent networks | ✅ Supported | ✅ Supported | Not supported | ✅ Supported |
96+
| Padding `same` in Conv | ✅ Supported | ✅ Supported | Not supported | ✅ Supported |
97+
| Empty poisson batches | ✅ Supported | ✅ Supported | Not supported | Not supported |
98+
99+
† Note, that performance differences are unstable and can vary a lot depending on the exact model and batch size.
100+
Numbers above are averaged over benchmarks with small models consisting of convolutional and linear layers.
101+
Note, that performance differences are only observed on GPU training, CPU performance seem to be almost identical
82102
for all approaches.
83103

84104
†† Layers that produce joint computations on batch samples (e.g. BatchNorm) are not allowed under any approach

opacus/grad_sample/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@
1818
from .dp_rnn import compute_rnn_linear_grad_sample # noqa
1919
from .embedding import compute_embedding_grad_sample # noqa
2020
from .embedding_norm_sample import compute_embedding_norm_sample # noqa
21+
from .grad_sample_controller import GradSampleController # noqa
2122
from .grad_sample_module import GradSampleModule, create_or_accumulate_grad_sample
2223
from .grad_sample_module_fast_gradient_clipping import ( # noqa
2324
GradSampleModuleFastGradientClipping,
@@ -45,6 +46,7 @@
4546

4647

4748
__all__ = [
49+
"GradSampleController",
4850
"GradSampleModule",
4951
"GradSampleModuleFastGradientClipping",
5052
"GradSampleModuleFastGradientClippingFSDP",

0 commit comments

Comments
 (0)