You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This example calls `run_attacks` with the default options to run a host of
@@ -57,9 +57,11 @@ Then, we can view the attack results by:
57
57
```python
58
58
print(attacks_result.summary())
59
59
# Example output:
60
-
# -> Best-performing attacks over all slices
61
-
# THRESHOLD_ATTACK (with 50000 training and 10000 test examples) achieved an AUC of 0.59 on slice Entire dataset
62
-
# THRESHOLD_ATTACK (with 50000 training and 10000 test examples) achieved an advantage of 0.20 on slice Entire dataset
60
+
# Best-performing attacks over all slices
61
+
# LOGISTIC_REGRESSION (with 7041 training and 3156 test examples) achieved an AUC of 0.72 on slice CORRECTLY_CLASSIFIED=False
62
+
# LOGISTIC_REGRESSION (with 7041 training and 3156 test examples) achieved an advantage of 0.34 on slice CORRECTLY_CLASSIFIED=False
63
+
# LOGISTIC_REGRESSION (with 5000 training and 1000 test examples) achieved a positive predictive value of 1.00 on slice CLASS=0
64
+
# THRESHOLD_ATTACK (with 50000 training and 10000 test examples) achieved top-5 epsilon lower bounds of 4.6254, 4.6121, 4.5986, 4.5850, 4.5711 on slice Entire dataset
63
65
```
64
66
65
67
### Other codelabs
@@ -100,16 +102,17 @@ First, similar as before, we specify the input for the attack as an
100
102
# loss_test shape: (n_test, )
101
103
102
104
attack_input = AttackInputData(
103
-
logits_train= logits_train,
104
-
logits_test= logits_test,
105
-
loss_train= loss_train,
106
-
loss_test= loss_test,
107
-
labels_train= labels_train,
108
-
labels_test= labels_test)
105
+
logits_train=logits_train,
106
+
logits_test=logits_test,
107
+
loss_train=loss_train,
108
+
loss_test=loss_test,
109
+
labels_train=labels_train,
110
+
labels_test=labels_test,
111
+
)
109
112
```
110
113
111
114
Instead of `logits`, you can also specify `probs_train` and `probs_test` as the
112
-
predicted probabilty vectors of each example.
115
+
predicted probability vectors of each example.
113
116
114
117
Then, we specify some details of the attack. The first part includes the
115
118
specifications of the slicing of the data. For example, we may want to evaluate
@@ -118,91 +121,107 @@ the model's classification. These can be specified by a `SlicingSpec` object.
118
121
119
122
```python
120
123
slicing_spec = SlicingSpec(
121
-
entire_dataset=True,
122
-
by_class=True,
123
-
by_percentiles=False,
124
-
by_classification_correctness=True)
124
+
entire_dataset=True,
125
+
by_class=True,
126
+
by_percentiles=False,
127
+
by_classification_correctness=True,
128
+
)
125
129
```
126
130
127
131
The second part specifies the classifiers for the attacker to use. Currently,
128
132
our API supports five classifiers, including `AttackType.THRESHOLD_ATTACK` for
This returns an object of type `AttackResults`. We can, for example, use the
151
-
following code to see the attack results specificed per-slice, as we have
152
-
request attacks by class and by model's classification correctness.
157
+
following code to see the attack results specified per-slice, as we have request
158
+
attacks by class and by model's classification correctness.
153
159
154
160
```python
155
161
print(attacks_result.summary(by_slices=True))
156
162
# Example output:
157
-
# -> Best-performing attacks over all slices
158
-
# THRESHOLD_ATTACK achieved an AUC of 0.75 on slice CORRECTLY_CLASSIFIED=False
159
-
# THRESHOLD_ATTACK achieved an advantage of 0.38 on slice CORRECTLY_CLASSIFIED=False
160
-
#
161
-
# Best-performing attacks over slice: "Entire dataset"
162
-
# LOGISTIC_REGRESSION achieved an AUC of 0.61
163
-
# THRESHOLD_ATTACK achieved an advantage of 0.22
164
-
#
165
-
# Best-performing attacks over slice: "CLASS=0"
166
-
# LOGISTIC_REGRESSION achieved an AUC of 0.62
167
-
# LOGISTIC_REGRESSION achieved an advantage of 0.24
168
-
#
169
-
# Best-performing attacks over slice: "CLASS=1"
170
-
# LOGISTIC_REGRESSION achieved an AUC of 0.61
171
-
# LOGISTIC_REGRESSION achieved an advantage of 0.19
172
-
#
173
-
# ...
174
-
#
175
-
# Best-performing attacks over slice: "CORRECTLY_CLASSIFIED=True"
176
-
# LOGISTIC_REGRESSION achieved an AUC of 0.53
177
-
# THRESHOLD_ATTACK achieved an advantage of 0.05
178
-
#
179
-
# Best-performing attacks over slice: "CORRECTLY_CLASSIFIED=False"
180
-
# THRESHOLD_ATTACK achieved an AUC of 0.75
181
-
# THRESHOLD_ATTACK achieved an advantage of 0.38
163
+
# Best-performing attacks over all slices
164
+
# LOGISTIC_REGRESSION (with 7041 training and 3156 test examples) achieved an AUC of 0.72 on slice CORRECTLY_CLASSIFIED=False
165
+
# LOGISTIC_REGRESSION (with 7041 training and 3156 test examples) achieved an advantage of 0.34 on slice CORRECTLY_CLASSIFIED=False
166
+
# LOGISTIC_REGRESSION (with 5000 training and 1000 test examples) achieved a positive predictive value of 1.00 on slice CLASS=0
167
+
# THRESHOLD_ATTACK (with 50000 training and 10000 test examples) achieved top-5 epsilon lower bounds of 4.6254, 4.6121, 4.5986, 4.5850, 4.5711 on slice Entire dataset
168
+
169
+
# Best-performing attacks over slice: "Entire dataset"
170
+
# LOGISTIC_REGRESSION (with 50000 training and 10000 test examples) achieved an AUC of 0.58
171
+
# LOGISTIC_REGRESSION (with 50000 training and 10000 test examples) achieved an advantage of 0.17
172
+
# THRESHOLD_ATTACK (with 50000 training and 10000 test examples) achieved a positive predictive value of 0.86
173
+
# THRESHOLD_ATTACK (with 50000 training and 10000 test examples) achieved top-5 epsilon lower bounds of 4.6254, 4.6121, 4.5986, 4.5850, 4.5711
174
+
175
+
# Best-performing attacks over slice: "CLASS=0"
176
+
# LOGISTIC_REGRESSION (with 5000 training and 1000 test examples) achieved an AUC of 0.63
177
+
# LOGISTIC_REGRESSION (with 5000 training and 1000 test examples) achieved an advantage of 0.19
178
+
# LOGISTIC_REGRESSION (with 5000 training and 1000 test examples) achieved a positive predictive value of 1.00
179
+
# THRESHOLD_ATTACK (with 5000 training and 1000 test examples) achieved top-5 epsilon lower bounds of 4.1920, 4.1645, 4.1364, 4.1074, 4.0775
180
+
181
+
# ...
182
+
183
+
# Best-performing attacks over slice: "CORRECTLY_CLASSIFIED=True"
184
+
# LOGISTIC_REGRESSION (with 42959 training and 6844 test examples) achieved an AUC of 0.51
185
+
# LOGISTIC_REGRESSION (with 42959 training and 6844 test examples) achieved an advantage of 0.05
186
+
# LOGISTIC_REGRESSION (with 42959 training and 6844 test examples) achieved a positive predictive value of 0.94
187
+
# THRESHOLD_ATTACK (with 42959 training and 6844 test examples) achieved top-5 epsilon lower bounds of 0.9495, 0.6358, 0.5630, 0.4536, 0.4341
188
+
189
+
# Best-performing attacks over slice: "CORRECTLY_CLASSIFIED=False"
190
+
# LOGISTIC_REGRESSION (with 7041 training and 3156 test examples) achieved an AUC of 0.72
191
+
# LOGISTIC_REGRESSION (with 7041 training and 3156 test examples) achieved an advantage of 0.34
192
+
# LOGISTIC_REGRESSION (with 7041 training and 3156 test examples) achieved a positive predictive value of 0.97
193
+
# LOGISTIC_REGRESSION (with 7041 training and 3156 test examples) achieved top-5 epsilon lower bounds of 3.8844, 3.8678, 3.8510, 3.8339, 3.8165
182
194
```
183
195
184
196
#### Viewing and plotting the attack results
185
197
186
198
We have seen an example of using `summary()` to view the attack results as text.
187
199
We also provide some other ways for inspecting the attack results.
188
200
189
-
To get the attack that achieves the maximum attacker advantage or AUC, we can do
201
+
To get the attack that achieves the maximum attacker advantage, AUC, or epsilon
#-> Attack type with max AUC: THRESHOLD_ATTACK, AUC of 0.75, Attacker advantage of 0.38
224
+
# Attack type with max AUC: LOGISTIC_REGRESSION, AUC of 0.72, Attacker advantage of 0.34, Epsilon lower bound of [3.88435257 3.86781797 3.85100545 3.83390548 3.81650809]
206
225
```
207
226
208
227
We can also plot its ROC curve by
@@ -217,24 +236,24 @@ which would give a figure like the one below
0 commit comments