aai-institute
diff --git a/‎docs/value/classwise-shapley.md‎
Lines changed: 74 additions & 33 deletions b/‎docs/value/classwise-shapley.md‎
Lines changed: 74 additions & 33 deletions
diff --git a/‎docs/value/img/classwise-shapley-metric-auc-cv.svg‎
Lines changed: 1 addition & 0 deletions b/‎docs/value/img/classwise-shapley-metric-auc-cv.svg‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/value/img/classwise-shapley-metric-auc-mean.svg‎
Lines changed: 1 addition & 0 deletions b/‎docs/value/img/classwise-shapley-metric-auc-mean.svg‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/value/img/classwise-shapley-metric-mlp-cv.svg‎ renamed to ‎docs/value/img/classwise-shapley-metric-wad-cv.svg‎ b/‎docs/value/img/classwise-shapley-metric-mlp-cv.svg‎ renamed to ‎docs/value/img/classwise-shapley-metric-wad-cv.svg‎
diff --git a/‎docs/value/img/classwise-shapley-metric-mlp-mean.svg‎ renamed to ‎docs/value/img/classwise-shapley-metric-wad-mean.svg‎
Lines changed: 1 addition & 1 deletion b/‎docs/value/img/classwise-shapley-metric-mlp-mean.svg‎ renamed to ‎docs/value/img/classwise-shapley-metric-wad-mean.svg‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/value/img/classwise-shapley-roc-auc-logistic-regression.svg‎
Lines changed: 1 addition & 0 deletions b/‎docs/value/img/classwise-shapley-roc-auc-logistic-regression.svg‎
Lines changed: 1 addition & 0 deletions
@@ -4,12 +4,14 @@ title: Class-wise Shapley
 
 # Class-wise Shapley
 
+## AlgorIntroductionithm
+
 Class-wise Shapley (CWS) [@schoch_csshapley_2022] offers a Shapley framework
 tailored for classification problems. Let $D$ be a dataset, $D_{y_i}$ be the
 subset of $D$ with labels $y_i$, and $D_{-y_i}$ be the complement of $D_{y_i}$
 in $D$. The key idea is that a sample $(x_i, y_i)$ might enhance the overall
 performance on $D$, while being detrimental for the performance on $D_{y_i}$. To
-address this issue, the authors introduce the estimator
+address this issue, the authors introduced
 
 $$
 v_u(i) = \frac{1}{2^{|D_{-y_i}|}} \sum_{S_{-y_i}}
@@ -54,7 +56,7 @@ the dataset.
     ```
 
 
-## Class-wise scorer
+### Class-wise scorer
 
 In order to use the classwise Shapley value, one needs to define a
 [ClasswiseScorer][pydvl.value.shapley.classwise.ClasswiseScorer]. Given a sample
@@ -88,12 +90,12 @@ and $g$ for an exploration with different base scores.
     )
     ```
 
-The level curves for $f(x)=x$ and $g(x)=e^x$ are depicted below. The white lines
+The level curves for $f(x)=x$ and $g(x)=e^x$ are depicted below. The lines
 illustrate the contour lines, annotated with their respective gradients.
 
 ![Level curves of the class-wise utility](img/classwise-shapley-discounted-utility-function.svg){ align=left width=33%  class=invertible }
 
-# Evaluation
+## Evaluation
 
 We evaluate the method on the nine datasets used in [@schoch_csshapley_2022],
 using the same pre-processing. For images, PCA is used to reduce  down to 32 the
@@ -113,13 +115,15 @@ pre-processing steps, please refer to the paper.
     | MNIST (binary) | Image     | 2       | 32         | 554       |
     | MNIST (multi)  | Image     | 10      | 32         | 554       |
 
-## Performance for (direct) point removal
+### Performance for (direct) point removal
 
 We compare the mean and the coefficient of variation (CV) of the weighted accuracy drop 
 (WAD) as proposed in [@schoch_csshapley_2022]. The metric is defined by 
 
 $$
-\text{WAD} = a_T(D) - \sum_{j=1}^{n} \frac{a_{T_{-\{1 \colon j \}}}(D)}{j},
+\text{WAD} =  \sum_{j=1}^{n} \left ( \frac{1}{j} \sum_{i=1}^{j} 
+a_{T_{-\{1 \colon i-1 \}}}(D) - a_{T_{-\{1 \colon i \}}}(D) \right)
+= a_T(D) - \sum_{j=1}^{n} \frac{a_{T_{-\{1 \colon j \}}}(D)}{j} ,
 $$
 
 where $a_T(D)$ is the accuracy of the model (trained on $T$) evaluated on $D$ and 
@@ -129,15 +133,15 @@ standard deviation $\sigma_\text{WAD}$. The valuation of the training samples an
 evaluation on the validation samples are both calculated based on a logistic regression 
 model. Let's have a look at the mean 
 
-![WAD drop (Mean)](img/classwise-shapley-metric-mlp-mean.svg){ align=left width=50%  class=invertible }
+![Weighted accuracy drop (Mean)](img/classwise-shapley-metric-wad-mean.svg){ align=left width=50%  class=invertible }
 
 of the metric WAD. The table shows that CWS is competitive with all three other methods.
 In all problems except `MNIST (multi)` it is better than TMCS, whereas in that
 case TMCS has a slight advantage. Another important quantity is the CV
 $\frac{\sigma_\text{WAD}}{\mu_\text{WAD}}$. It normalizes the standard
-deviation relative to the mean. The results are shown below.
+deviation by the mean. The results are shown below.
 
-![WAD drop (CV)](img/classwise-shapley-metric-mlp-cv.svg){ align=left width=50%  class=invertible }
+![Weighted accuracy drop (CV)](img/classwise-shapley-metric-wad-cv.svg){ align=left width=50%  class=invertible }
 
 It is noteworthy that CWS is not the best method in terms of CV (Lower CV means better
 performance). For `CIFAR10`, `Click`, `CPU` and `MNIST (binary)` Beta Shapley has the 
@@ -146,52 +150,89 @@ lowest CV. For `Diabetes`, `MNIST (multi)` and `Phoneme` CWS is the winner and f
 highest relative standard deviation.
 
 The following plot shows valuation-set accuracy of logistic regression on the y-axis. 
-The x-axis shows the number of samples removed. Random values serve as a baseline.
+The x-axis shows the number of samples removed. Random values serve as a baseline. 
+Each line represents five runs, whereas bootstrapping was used to estimate the 95%
+confidence intervals.
+
 
 ![Accuracy after sample removal using values from logistic regression](img/classwise-shapley-weighted-accuracy-drop-logistic-regression-to-logistic-regression.svg){ class=invertible }
 
-Overall we conclude that in terms of mean WAD CWS and TMCS are the best methods. In
-terms of the CV CWS and Beta Shapley are the clear winners. Hence, CWS is a competitive
-method for valuation of data sets with a low relative standard deviation. We remark that
-for all valuation methods the same number of _evaluations of the marginal utility_ was 
-used.
+Samples are removed from high to low valuation order and hence we expect a steep
+decrease in the curve. Overall we conclude that in terms of mean WAD CWS and TMCS are 
+the best methods. In terms of CV, CWS and Beta Shapley are the clear winners. Hence, CWS
+is a competitive method for valuation of data sets with a low relative standard
+deviation. We remark that for all valuation methods the same number of _evaluations of 
+the marginal utility_ was used.
 
-## Performance in value transfer for point removal
+### Performance in value transfer for point removal
 
-Practically more relevant is transfer of values for one model to another one. As before
-the values are calculated using a logistic regression model. However, here they are 
+Practically more relevant is the transfer of values from one model to another one. As 
+before the values are calculated using logistic regression. However, this time they are
 used to prune the training set for a neural network. The following plot shows 
 valuation-set accuracy of the network on the y-axis, and the number of samples removed
 on the x-axis.
 
 ![Accuracy after sample removal using values transferred from logistic regression
 to an MLP](img/classwise-shapley-weighted-accuracy-drop-logistic-regression-to-mlp.svg){ class=invertible }
 
-Samples are removed from high to low valuation order and hence we expect a steep
+Again samples are removed from high to low valuation order and hence we expect a steep
 decrease in the curve. CWS is competitive with the compared methods. Especially
 in very unbalanced datasets, like `Click`, the performance of CWS seems
 superior. In other datasets, like `Covertype` and `Diabetes` and `MNIST (multi)`
 the performance is on par with TMC. For `MNIST (binary)` and `Phoneme` the
-performance is competitive. We remark that for all valuation methods the
-same number of _evaluations of the marginal utility_ was used.
+performance is competitive.
 
-## Density of values 
+### Density of values
 
-Last but not least let's compare the distribution of values for TMCS (green) and CWS
-(red). Therefore, the following plots show a histogram the density estimated by kernel
-density estimation (KDE). 
+This experiment compares the distribution of values for TMCS (green) and CWS
+(red). Both methods are chosen due to their competivieness. The following plots show a 
+histogram as well as the density estimated by kernel density estimation (KDE). 
 
 
 ![Density of TMCS and CWS](img/classwise-shapley-density.svg){ class=invertible }
 
-As the metrics already suggest TMCS has a higher variance as CWS. In mean, they 
-seem to approximate the same quantity, which is not obvious due to their different 
-nature of their utility functions. 
+As apparent in the metric CV from the previous section, the variance of CWS is lower 
+than for TCMS. They seem to approximate the same form of distribution, although their
+utility functions are different.
 
 For `Click` TMCS has a multi-modal distribution of values. This is inferior to CWS which
 has only one-mode and is more stable on that dataset. `Click` is a very unbalanced 
-dataset and hence CWS seems to be more robust on unbalanced datasets. It seems that
-CWS is a good way to handle classification problems. Given the underlying similarities 
-in the architecture of TMCS, Beta Shapley, and CWS algorithms, there's a clear pathway 
-for improving convergence rates, sample efficiency, and stabilize variance in all of 
-these methods.
+dataset, and we conclude that CWS seems to be more robust on unbalanced datasets. 
+
+### Noise removal for 20% of the flipped data
+
+Another type of experiment uses the algorithms to explore mis-labelled data points. The
+indices are chosen randomly. Multi-class datasets are discarded, because they do not 
+possess a unique flipping strategy. The following table shows the mean of the area under
+the curve (AUC) of five runs. 
+
+![Area under the Curve (Mean)](img/classwise-shapley-metric-auc-mean.svg){ align=left width=50%  class=invertible }
+
+In the majority of the cases TMCS has a slight advantage over CWS on average. For
+`Click` CWS has a slight edge, most probably due to the unbalanced nature of `Click`.
+The following plot shows the CV for the AUC of the five runs.
+
+![Area under the Curve (CV)](img/classwise-shapley-metric-auc-cv.svg){ align=left width=50%  class=invertible }
+ 
+In terms of CV, CWS has a clear edge over TMCS and Beta Shapley. The following plot 
+shows the receiving operator characteristic (ROC) for the mean of five runs.
+
+![Receiver Operating Characteristic](img/classwise-shapley-roc-auc-logistic-regression.svg){ align=left width=50%  class=invertible }
+
+The ROC curve is a plot of the true positive rate (TPR) against the false positive rate
+(FPR). The TPR is the ratio of correctly classified positive samples to all positive
+samples. The FPR is the ratio of incorrectly classified negative samples to all negative
+samples. This tuple is calculated for all prefixes of the training set with respect to
+the values. Although it seems that TMCS is the winner, considering sample efficiency, 
+CWS stays competitive. For a perfectly balanced dataset, CWS needs fewer samples than
+TCMS on average. CWS is competitive and almost on par with TCMS, while requring less
+samples on average.
+
+## Conclusion
+
+CWS is a reasonable and effective way to handle classification problems. It reduces the 
+computing power and variance by splitting up the data set into classes. Given the 
+underlying similarities in the architecture of TMCS, Beta Shapley, and CWS, there's a
+clear pathway for improving convergence rates, sample efficiency, and stabilize variance 
+for TMCS and Beta Shapley. 
+