Skip to content

Commit fdc4a06

Browse files
author
Markus Semmler
committed
Add noise removal to README.md.
1 parent 37efbb8 commit fdc4a06

6 files changed

+78
-34
lines changed

docs/value/classwise-shapley.md

Lines changed: 74 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -4,12 +4,14 @@ title: Class-wise Shapley
44

55
# Class-wise Shapley
66

7+
## AlgorIntroductionithm
8+
79
Class-wise Shapley (CWS) [@schoch_csshapley_2022] offers a Shapley framework
810
tailored for classification problems. Let $D$ be a dataset, $D_{y_i}$ be the
911
subset of $D$ with labels $y_i$, and $D_{-y_i}$ be the complement of $D_{y_i}$
1012
in $D$. The key idea is that a sample $(x_i, y_i)$ might enhance the overall
1113
performance on $D$, while being detrimental for the performance on $D_{y_i}$. To
12-
address this issue, the authors introduce the estimator
14+
address this issue, the authors introduced
1315

1416
$$
1517
v_u(i) = \frac{1}{2^{|D_{-y_i}|}} \sum_{S_{-y_i}}
@@ -54,7 +56,7 @@ the dataset.
5456
```
5557

5658

57-
## Class-wise scorer
59+
### Class-wise scorer
5860

5961
In order to use the classwise Shapley value, one needs to define a
6062
[ClasswiseScorer][pydvl.value.shapley.classwise.ClasswiseScorer]. Given a sample
@@ -88,12 +90,12 @@ and $g$ for an exploration with different base scores.
8890
)
8991
```
9092

91-
The level curves for $f(x)=x$ and $g(x)=e^x$ are depicted below. The white lines
93+
The level curves for $f(x)=x$ and $g(x)=e^x$ are depicted below. The lines
9294
illustrate the contour lines, annotated with their respective gradients.
9395

9496
![Level curves of the class-wise utility](img/classwise-shapley-discounted-utility-function.svg){ align=left width=33% class=invertible }
9597

96-
# Evaluation
98+
## Evaluation
9799

98100
We evaluate the method on the nine datasets used in [@schoch_csshapley_2022],
99101
using the same pre-processing. For images, PCA is used to reduce down to 32 the
@@ -113,13 +115,15 @@ pre-processing steps, please refer to the paper.
113115
| MNIST (binary) | Image | 2 | 32 | 554 |
114116
| MNIST (multi) | Image | 10 | 32 | 554 |
115117

116-
## Performance for (direct) point removal
118+
### Performance for (direct) point removal
117119

118120
We compare the mean and the coefficient of variation (CV) of the weighted accuracy drop
119121
(WAD) as proposed in [@schoch_csshapley_2022]. The metric is defined by
120122

121123
$$
122-
\text{WAD} = a_T(D) - \sum_{j=1}^{n} \frac{a_{T_{-\{1 \colon j \}}}(D)}{j},
124+
\text{WAD} = \sum_{j=1}^{n} \left ( \frac{1}{j} \sum_{i=1}^{j}
125+
a_{T_{-\{1 \colon i-1 \}}}(D) - a_{T_{-\{1 \colon i \}}}(D) \right)
126+
= a_T(D) - \sum_{j=1}^{n} \frac{a_{T_{-\{1 \colon j \}}}(D)}{j} ,
123127
$$
124128

125129
where $a_T(D)$ is the accuracy of the model (trained on $T$) evaluated on $D$ and
@@ -129,15 +133,15 @@ standard deviation $\sigma_\text{WAD}$. The valuation of the training samples an
129133
evaluation on the validation samples are both calculated based on a logistic regression
130134
model. Let's have a look at the mean
131135

132-
![WAD drop (Mean)](img/classwise-shapley-metric-mlp-mean.svg){ align=left width=50% class=invertible }
136+
![Weighted accuracy drop (Mean)](img/classwise-shapley-metric-wad-mean.svg){ align=left width=50% class=invertible }
133137

134138
of the metric WAD. The table shows that CWS is competitive with all three other methods.
135139
In all problems except `MNIST (multi)` it is better than TMCS, whereas in that
136140
case TMCS has a slight advantage. Another important quantity is the CV
137141
$\frac{\sigma_\text{WAD}}{\mu_\text{WAD}}$. It normalizes the standard
138-
deviation relative to the mean. The results are shown below.
142+
deviation by the mean. The results are shown below.
139143

140-
![WAD drop (CV)](img/classwise-shapley-metric-mlp-cv.svg){ align=left width=50% class=invertible }
144+
![Weighted accuracy drop (CV)](img/classwise-shapley-metric-wad-cv.svg){ align=left width=50% class=invertible }
141145

142146
It is noteworthy that CWS is not the best method in terms of CV (Lower CV means better
143147
performance). For `CIFAR10`, `Click`, `CPU` and `MNIST (binary)` Beta Shapley has the
@@ -146,52 +150,89 @@ lowest CV. For `Diabetes`, `MNIST (multi)` and `Phoneme` CWS is the winner and f
146150
highest relative standard deviation.
147151

148152
The following plot shows valuation-set accuracy of logistic regression on the y-axis.
149-
The x-axis shows the number of samples removed. Random values serve as a baseline.
153+
The x-axis shows the number of samples removed. Random values serve as a baseline.
154+
Each line represents five runs, whereas bootstrapping was used to estimate the 95%
155+
confidence intervals.
156+
150157

151158
![Accuracy after sample removal using values from logistic regression](img/classwise-shapley-weighted-accuracy-drop-logistic-regression-to-logistic-regression.svg){ class=invertible }
152159

153-
Overall we conclude that in terms of mean WAD CWS and TMCS are the best methods. In
154-
terms of the CV CWS and Beta Shapley are the clear winners. Hence, CWS is a competitive
155-
method for valuation of data sets with a low relative standard deviation. We remark that
156-
for all valuation methods the same number of _evaluations of the marginal utility_ was
157-
used.
160+
Samples are removed from high to low valuation order and hence we expect a steep
161+
decrease in the curve. Overall we conclude that in terms of mean WAD CWS and TMCS are
162+
the best methods. In terms of CV, CWS and Beta Shapley are the clear winners. Hence, CWS
163+
is a competitive method for valuation of data sets with a low relative standard
164+
deviation. We remark that for all valuation methods the same number of _evaluations of
165+
the marginal utility_ was used.
158166

159-
## Performance in value transfer for point removal
167+
### Performance in value transfer for point removal
160168

161-
Practically more relevant is transfer of values for one model to another one. As before
162-
the values are calculated using a logistic regression model. However, here they are
169+
Practically more relevant is the transfer of values from one model to another one. As
170+
before the values are calculated using logistic regression. However, this time they are
163171
used to prune the training set for a neural network. The following plot shows
164172
valuation-set accuracy of the network on the y-axis, and the number of samples removed
165173
on the x-axis.
166174

167175
![Accuracy after sample removal using values transferred from logistic regression
168176
to an MLP](img/classwise-shapley-weighted-accuracy-drop-logistic-regression-to-mlp.svg){ class=invertible }
169177

170-
Samples are removed from high to low valuation order and hence we expect a steep
178+
Again samples are removed from high to low valuation order and hence we expect a steep
171179
decrease in the curve. CWS is competitive with the compared methods. Especially
172180
in very unbalanced datasets, like `Click`, the performance of CWS seems
173181
superior. In other datasets, like `Covertype` and `Diabetes` and `MNIST (multi)`
174182
the performance is on par with TMC. For `MNIST (binary)` and `Phoneme` the
175-
performance is competitive. We remark that for all valuation methods the
176-
same number of _evaluations of the marginal utility_ was used.
183+
performance is competitive.
177184

178-
## Density of values
185+
### Density of values
179186

180-
Last but not least let's compare the distribution of values for TMCS (green) and CWS
181-
(red). Therefore, the following plots show a histogram the density estimated by kernel
182-
density estimation (KDE).
187+
This experiment compares the distribution of values for TMCS (green) and CWS
188+
(red). Both methods are chosen due to their competivieness. The following plots show a
189+
histogram as well as the density estimated by kernel density estimation (KDE).
183190

184191

185192
![Density of TMCS and CWS](img/classwise-shapley-density.svg){ class=invertible }
186193

187-
As the metrics already suggest TMCS has a higher variance as CWS. In mean, they
188-
seem to approximate the same quantity, which is not obvious due to their different
189-
nature of their utility functions.
194+
As apparent in the metric CV from the previous section, the variance of CWS is lower
195+
than for TCMS. They seem to approximate the same form of distribution, although their
196+
utility functions are different.
190197

191198
For `Click` TMCS has a multi-modal distribution of values. This is inferior to CWS which
192199
has only one-mode and is more stable on that dataset. `Click` is a very unbalanced
193-
dataset and hence CWS seems to be more robust on unbalanced datasets. It seems that
194-
CWS is a good way to handle classification problems. Given the underlying similarities
195-
in the architecture of TMCS, Beta Shapley, and CWS algorithms, there's a clear pathway
196-
for improving convergence rates, sample efficiency, and stabilize variance in all of
197-
these methods.
200+
dataset, and we conclude that CWS seems to be more robust on unbalanced datasets.
201+
202+
### Noise removal for 20% of the flipped data
203+
204+
Another type of experiment uses the algorithms to explore mis-labelled data points. The
205+
indices are chosen randomly. Multi-class datasets are discarded, because they do not
206+
possess a unique flipping strategy. The following table shows the mean of the area under
207+
the curve (AUC) of five runs.
208+
209+
![Area under the Curve (Mean)](img/classwise-shapley-metric-auc-mean.svg){ align=left width=50% class=invertible }
210+
211+
In the majority of the cases TMCS has a slight advantage over CWS on average. For
212+
`Click` CWS has a slight edge, most probably due to the unbalanced nature of `Click`.
213+
The following plot shows the CV for the AUC of the five runs.
214+
215+
![Area under the Curve (CV)](img/classwise-shapley-metric-auc-cv.svg){ align=left width=50% class=invertible }
216+
217+
In terms of CV, CWS has a clear edge over TMCS and Beta Shapley. The following plot
218+
shows the receiving operator characteristic (ROC) for the mean of five runs.
219+
220+
![Receiver Operating Characteristic](img/classwise-shapley-roc-auc-logistic-regression.svg){ align=left width=50% class=invertible }
221+
222+
The ROC curve is a plot of the true positive rate (TPR) against the false positive rate
223+
(FPR). The TPR is the ratio of correctly classified positive samples to all positive
224+
samples. The FPR is the ratio of incorrectly classified negative samples to all negative
225+
samples. This tuple is calculated for all prefixes of the training set with respect to
226+
the values. Although it seems that TMCS is the winner, considering sample efficiency,
227+
CWS stays competitive. For a perfectly balanced dataset, CWS needs fewer samples than
228+
TCMS on average. CWS is competitive and almost on par with TCMS, while requring less
229+
samples on average.
230+
231+
## Conclusion
232+
233+
CWS is a reasonable and effective way to handle classification problems. It reduces the
234+
computing power and variance by splitting up the data set into classes. Given the
235+
underlying similarities in the architecture of TMCS, Beta Shapley, and CWS, there's a
236+
clear pathway for improving convergence rates, sample efficiency, and stabilize variance
237+
for TMCS and Beta Shapley.
238+

docs/value/img/classwise-shapley-metric-auc-cv.svg

Lines changed: 1 addition & 0 deletions
Loading

docs/value/img/classwise-shapley-metric-auc-mean.svg

Lines changed: 1 addition & 0 deletions
Loading

docs/value/img/classwise-shapley-metric-mlp-mean.svg renamed to docs/value/img/classwise-shapley-metric-wad-mean.svg

Lines changed: 1 addition & 1 deletion
Loading

docs/value/img/classwise-shapley-roc-auc-logistic-regression.svg

Lines changed: 1 addition & 0 deletions
Loading

0 commit comments

Comments
 (0)