@@ -11,34 +11,53 @@ alias:
1111 of sampling sets of a given size), semi-value coefficients and Monte Carlo
1212 sampling.
1313
14- Samplers define so-called * sampling strategies* , which subclass
15- [ EvaluationStrategy] [ pydvl.valuation.samplers.EvaluationStrategy ] . These
16- strategies are where sub-processes compute marginal utilities, by whichever
17- method is required depending on the sampling scheme. Crucially, they compute
18- ** the product of a semi-value coefficient with a sampling weight** . This can be
19- seen either as a form of importance sampling, or as a mechanism to allow
20- mix-and-matching of sampling strategies and semi-value coefficients.
21-
22- It is however an unnecessary step when the sampling distribution yields exactly
23- the semi-value coefficient, as we explain below.
14+ Valuation methods based on semi-values involve computing averages of [ marginal
15+ utilities] [ glossary-marginal-utility ] over all possible subsets of the training
16+ data. As explained in [ the introduction with uniform
17+ sampling] [ monte-carlo-combinatorial-shapley-intro ] , we use Monte Carlo
18+ approximations to compute these averages. Below we show that this introduces
19+ additional terms in the results due to the sampling probabilities, yielding
20+ _ effective coefficients_ that are ** the product of the semi-value coefficient
21+ with a sampling probability** . To correct for this, all samplers provide a
22+ method which is just $p(S),$ the probability of sampling a set $S.$ This can be
23+ seen either as a form of importance sampling to reduce variance, or as a
24+ mechanism to allow mix-and-matching of sampling strategies and semi-value
25+ coefficients.
26+
27+ However, the correction an unnecessary step when the sampling distribution
28+ yields exactly the semi-value coefficient, a situation which is the basis for
29+ several methods proposed in the literature.
2430
2531??? Example "The core semi-value computation for powerset sampling"
2632 This is the core of the marginal update computation in
2733 [ PowersetEvaluationStrategy] [ pydvl.valuation.samplers.powerset.PowersetEvaluationStrategy ] :*
2834 ```python
2935 for sample in batch:
30- u_i = self.utility(sample.with_idx_in_subset())
31- u = self.utility(sample)
32- marginal = u_i - u
33- sign = np.sign(marginal)
34- log_marginal = -np.inf if marginal == 0 else np.log(marginal * sign)
35- log_marginal += self.log_correction(self.n_indices, len(sample.subset))
36- updates.append(ValueUpdate(sample.idx, log_marginal, sign))
36+ u_i = self.utility(sample.with_idx_in_subset())
37+ u = self.utility(sample)
38+ marginal = u_i - u
39+ sign = np.sign(marginal)
40+ log_marginal = -np.inf if marginal == 0 else np.log(marginal * sign)
41+
42+ # Here's the coefficient, as defined by the valuation method,
43+ # potentially with a correction.
44+ log_marginal += self.valuation_coefficient(
45+ self.n_indices, len(sample.subset)
46+ )
47+
48+ updates.append(ValueUpdate(sample.idx, log_marginal, sign))
49+ ...
3750 ```
3851
39- We will be discussing the ` log_correction(n, k) ` . In pyDVL we allow for almost
40- arbitrary combinations of semi-value and sampler, but allow switching off the
41- coefficients and provide dedicated classes that do so when possible.
52+ The ` valuation_coefficient(n, k) ` is in effect defined by the valuation method,
53+ and allows for almost arbitrary combinations of semi-value and sampler. By
54+ subclassing one can also switch off the coefficients when indicated. We provide
55+ dedicated classes that do so for the most common combinations, like
56+ [ TMCShapleyValuation] [ pydvl.valuation.methods.shapley.TMCShapleyValuation ] or
57+ [ MSRBanzhafValuation] [ pydvl.valuation.methods.banzhaf.MSRBanzhafValuation ] . If
58+ you check the code you will see that they are in fact little more than thin
59+ wrappers.
60+
4261
4362## Uniform sampling
4463
@@ -87,20 +106,20 @@ coefficient
87106
88107$$ w_{\operatorname{unif}} (S) \equiv 2^{n - 1} . $$
89108
90- The product $w_ {\operatorname{unif}} (S) w_ {\operatorname{sh}} (S)$ is the
91- ` log_correction (n, k)` in the code above. Because of how samplers work the
92- coefficients only depend on the size $k = | S |$ of the subsets, and it will
93- always be the inverse of the probability of a set $S,$ given that it has size
109+ The product $w_ {\operatorname{unif}} (S) w_ {\operatorname{sh}} (S)$ is the
110+ ` valuation_coefficient (n, k)` in the code above. Because of how samplers work
111+ the coefficients only depend on the size $k = | S |$ of the subsets, and it will
112+ always be the inverse of the probability of a set $S,$ given that it has size
94113$k.$
95114
96115At every step of the MC algorithm we do the following:
97116
98117!!! abstract "Monte Carlo Shapley Update"
99118 1. sample $S_ {j} \sim \mathcal{U} (D_ {- i}),$ let $k = | S_ {j} |$
100- 1 . compute the marginal $\Delta_i (S_ {j})$
101- 1 . compute the product of coefficients for the sampler and the method:
119+ 2 . compute the marginal $\Delta_i (S_ {j})$
120+ 3 . compute the product of coefficients for the sampler and the method:
102121 $w_ {\operatorname{unif}} (k) w_ {\operatorname{sh}} (k)$
103- 1 . update the running average for $\hat{v}_ {\operatorname{unif},
122+ 4 . update the running average for $\hat{v}_ {\operatorname{unif},
104123 \operatorname{sh}}$
105124
106125## Picking a different distribution
@@ -301,20 +320,21 @@ overriding the property
301320[ log_coefficient] [ pydvl.valuation.methods.semivalue.SemivalueValuation.log_coefficient ]
302321to return ` None ` .
303322
304- Alternatively, we can mix and match samplers, effectively performing importance
305- sampling. Let $\mathcal{L}$ be the law of a sampling procedure such that
306- $p_ {\mathcal{L}} (S|k) = w_ {\operatorname{semi}}$ for some semi-value
307- coefficient, and let $\mathcal{Q}$ be that of any sampler we choose: Then:
323+ Alternatively, we can mix and match sampler and semi-values, effectively
324+ performing importance sampling. Let $\mathcal{L}$ be the law of a sampling
325+ procedure such that $p_ {\mathcal{L}} (S|k) = w_ {\operatorname{semi}}$ for some
326+ semi-value coefficient, and let $\mathcal{Q}$ be that of any sampler we choose.
327+ Then:
308328
309329$$ v_{\operatorname{semi}} (i) = \mathbb{E}_{\mathcal{L}} [\Delta_i (S)]
310330 = \mathbb{E}_{Q} \left[ \frac{w_{\operatorname{semi}} (S)}{p_{Q} (S|k)}
311331 \Delta_i (S) \right] $$
312332
313- The drawback is that a direct implementation with that much cancelling of
314- coefficients might be inefficient or numerically unstable. Integration issues
315- arise to compute $p_ {Q} (S|k)$ and so on. On the flip side, we can implement
316- any sampling method, like antithetic sampling, and immediately benefit in all
317- semi-value computations.
333+ The drawback is that a direct implementation with that much cancelling of
334+ coefficients might be inefficient or numerically unstable. Integration issues
335+ might arise to compute $p_ {Q} (S|k)$ and so on. On the plus side, we can
336+ implement any sampling method, like antithetic sampling, and immediately benefit
337+ in all semi-value computations.
318338
319339[ ^ 1 ] : At step $(\star)$ we have counted the number of permutations before a
320340fixed position of index $i$ and after it, because the utility does not depend on
0 commit comments