You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
where $a_T(D)$ is the accuracy of the model (trained on $T$) evaluated on $D$ and
@@ -129,15 +133,15 @@ standard deviation $\sigma_\text{WAD}$. The valuation of the training samples an
129
133
evaluation on the validation samples are both calculated based on a logistic regression
130
134
model. Let's have a look at the mean
131
135
132
-
{ align=left width=50% class=invertible }
136
+
{ align=left width=50% class=invertible }
133
137
134
138
of the metric WAD. The table shows that CWS is competitive with all three other methods.
135
139
In all problems except `MNIST (multi)` it is better than TMCS, whereas in that
136
140
case TMCS has a slight advantage. Another important quantity is the CV
137
141
$\frac{\sigma_\text{WAD}}{\mu_\text{WAD}}$. It normalizes the standard
138
-
deviation relative to the mean. The results are shown below.
142
+
deviation by the mean. The results are shown below.
139
143
140
-
{ align=left width=50% class=invertible }
144
+
{ align=left width=50% class=invertible }
141
145
142
146
It is noteworthy that CWS is not the best method in terms of CV (Lower CV means better
143
147
performance). For `CIFAR10`, `Click`, `CPU` and `MNIST (binary)` Beta Shapley has the
@@ -146,52 +150,89 @@ lowest CV. For `Diabetes`, `MNIST (multi)` and `Phoneme` CWS is the winner and f
146
150
highest relative standard deviation.
147
151
148
152
The following plot shows valuation-set accuracy of logistic regression on the y-axis.
149
-
The x-axis shows the number of samples removed. Random values serve as a baseline.
153
+
The x-axis shows the number of samples removed. Random values serve as a baseline.
154
+
Each line represents five runs, whereas bootstrapping was used to estimate the 95%
155
+
confidence intervals.
156
+
150
157
151
158
{ class=invertible }
152
159
153
-
Overall we conclude that in terms of mean WAD CWS and TMCS are the best methods. In
154
-
terms of the CV CWS and Beta Shapley are the clear winners. Hence, CWS is a competitive
155
-
method for valuation of data sets with a low relative standard deviation. We remark that
156
-
for all valuation methods the same number of _evaluations of the marginal utility_ was
157
-
used.
160
+
Samples are removed from high to low valuation order and hence we expect a steep
161
+
decrease in the curve. Overall we conclude that in terms of mean WAD CWS and TMCS are
162
+
the best methods. In terms of CV, CWS and Beta Shapley are the clear winners. Hence, CWS
163
+
is a competitive method for valuation of data sets with a low relative standard
164
+
deviation. We remark that for all valuation methods the same number of _evaluations of
165
+
the marginal utility_ was used.
158
166
159
-
## Performance in value transfer for point removal
167
+
###Performance in value transfer for point removal
160
168
161
-
Practically more relevant is transfer of values for one model to another one. As before
162
-
the values are calculated using a logistic regression model. However, here they are
169
+
Practically more relevant is the transfer of values from one model to another one. As
170
+
before the values are calculated using logistic regression. However, this time they are
163
171
used to prune the training set for a neural network. The following plot shows
164
172
valuation-set accuracy of the network on the y-axis, and the number of samples removed
165
173
on the x-axis.
166
174
167
175
{ class=invertible }
169
177
170
-
Samples are removed from high to low valuation order and hence we expect a steep
178
+
Again samples are removed from high to low valuation order and hence we expect a steep
171
179
decrease in the curve. CWS is competitive with the compared methods. Especially
172
180
in very unbalanced datasets, like `Click`, the performance of CWS seems
173
181
superior. In other datasets, like `Covertype` and `Diabetes` and `MNIST (multi)`
174
182
the performance is on par with TMC. For `MNIST (binary)` and `Phoneme` the
175
-
performance is competitive. We remark that for all valuation methods the
176
-
same number of _evaluations of the marginal utility_ was used.
183
+
performance is competitive.
177
184
178
-
## Density of values
185
+
###Density of values
179
186
180
-
Last but not least let's compare the distribution of values for TMCS (green) and CWS
181
-
(red). Therefore, the following plots show a histogram the density estimated by kernel
182
-
density estimation (KDE).
187
+
This experiment compares the distribution of values for TMCS (green) and CWS
188
+
(red). Both methods are chosen due to their competivieness. The following plots show a
189
+
histogram as well as the density estimated by kernel density estimation (KDE).
183
190
184
191
185
192
{ class=invertible }
186
193
187
-
As the metrics already suggest TMCS has a higher variance as CWS. In mean, they
188
-
seem to approximate the same quantity, which is not obvious due to their different
189
-
nature of their utility functions.
194
+
As apparent in the metric CV from the previous section, the variance of CWS is lower
195
+
than for TCMS. They seem to approximate the same form of distribution, although their
196
+
utility functions are different.
190
197
191
198
For `Click` TMCS has a multi-modal distribution of values. This is inferior to CWS which
192
199
has only one-mode and is more stable on that dataset. `Click` is a very unbalanced
193
-
dataset and hence CWS seems to be more robust on unbalanced datasets. It seems that
194
-
CWS is a good way to handle classification problems. Given the underlying similarities
195
-
in the architecture of TMCS, Beta Shapley, and CWS algorithms, there's a clear pathway
196
-
for improving convergence rates, sample efficiency, and stabilize variance in all of
197
-
these methods.
200
+
dataset, and we conclude that CWS seems to be more robust on unbalanced datasets.
201
+
202
+
### Noise removal for 20% of the flipped data
203
+
204
+
Another type of experiment uses the algorithms to explore mis-labelled data points. The
205
+
indices are chosen randomly. Multi-class datasets are discarded, because they do not
206
+
possess a unique flipping strategy. The following table shows the mean of the area under
207
+
the curve (AUC) of five runs.
208
+
209
+
{ align=left width=50% class=invertible }
210
+
211
+
In the majority of the cases TMCS has a slight advantage over CWS on average. For
212
+
`Click` CWS has a slight edge, most probably due to the unbalanced nature of `Click`.
213
+
The following plot shows the CV for the AUC of the five runs.
214
+
215
+
{ align=left width=50% class=invertible }
216
+
217
+
In terms of CV, CWS has a clear edge over TMCS and Beta Shapley. The following plot
218
+
shows the receiving operator characteristic (ROC) for the mean of five runs.
0 commit comments