Skip to content

Commit 2eadf37

Browse files
committed
Checklist: docs/value/sampling-weights.md
1 parent 1600789 commit 2eadf37

File tree

1 file changed

+56
-36
lines changed

1 file changed

+56
-36
lines changed

docs/value/sampling-weights.md

Lines changed: 56 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -11,34 +11,53 @@ alias:
1111
of sampling sets of a given size), semi-value coefficients and Monte Carlo
1212
sampling.
1313

14-
Samplers define so-called *sampling strategies*, which subclass
15-
[EvaluationStrategy][pydvl.valuation.samplers.EvaluationStrategy]. These
16-
strategies are where sub-processes compute marginal utilities, by whichever
17-
method is required depending on the sampling scheme. Crucially, they compute
18-
**the product of a semi-value coefficient with a sampling weight**. This can be
19-
seen either as a form of importance sampling, or as a mechanism to allow
20-
mix-and-matching of sampling strategies and semi-value coefficients.
21-
22-
It is however an unnecessary step when the sampling distribution yields exactly
23-
the semi-value coefficient, as we explain below.
14+
Valuation methods based on semi-values involve computing averages of [marginal
15+
utilities][glossary-marginal-utility] over all possible subsets of the training
16+
data. As explained in [the introduction with uniform
17+
sampling][monte-carlo-combinatorial-shapley-intro], we use Monte Carlo
18+
approximations to compute these averages. Below we show that this introduces
19+
additional terms in the results due to the sampling probabilities, yielding
20+
_effective coefficients_ that are **the product of the semi-value coefficient
21+
with a sampling probability**. To correct for this, all samplers provide a
22+
method which is just $p(S),$ the probability of sampling a set $S.$ This can be
23+
seen either as a form of importance sampling to reduce variance, or as a
24+
mechanism to allow mix-and-matching of sampling strategies and semi-value
25+
coefficients.
26+
27+
However, the correction an unnecessary step when the sampling distribution
28+
yields exactly the semi-value coefficient, a situation which is the basis for
29+
several methods proposed in the literature.
2430

2531
??? Example "The core semi-value computation for powerset sampling"
2632
This is the core of the marginal update computation in
2733
[PowersetEvaluationStrategy][pydvl.valuation.samplers.powerset.PowersetEvaluationStrategy]:*
2834
```python
2935
for sample in batch:
30-
u_i = self.utility(sample.with_idx_in_subset())
31-
u = self.utility(sample)
32-
marginal = u_i - u
33-
sign = np.sign(marginal)
34-
log_marginal = -np.inf if marginal == 0 else np.log(marginal * sign)
35-
log_marginal += self.log_correction(self.n_indices, len(sample.subset))
36-
updates.append(ValueUpdate(sample.idx, log_marginal, sign))
36+
u_i = self.utility(sample.with_idx_in_subset())
37+
u = self.utility(sample)
38+
marginal = u_i - u
39+
sign = np.sign(marginal)
40+
log_marginal = -np.inf if marginal == 0 else np.log(marginal * sign)
41+
42+
# Here's the coefficient, as defined by the valuation method,
43+
# potentially with a correction.
44+
log_marginal += self.valuation_coefficient(
45+
self.n_indices, len(sample.subset)
46+
)
47+
48+
updates.append(ValueUpdate(sample.idx, log_marginal, sign))
49+
...
3750
```
3851

39-
We will be discussing the `log_correction(n, k)`. In pyDVL we allow for almost
40-
arbitrary combinations of semi-value and sampler, but allow switching off the
41-
coefficients and provide dedicated classes that do so when possible.
52+
The `valuation_coefficient(n, k)` is in effect defined by the valuation method,
53+
and allows for almost arbitrary combinations of semi-value and sampler. By
54+
subclassing one can also switch off the coefficients when indicated. We provide
55+
dedicated classes that do so for the most common combinations, like
56+
[TMCShapleyValuation][pydvl.valuation.methods.shapley.TMCShapleyValuation] or
57+
[MSRBanzhafValuation][pydvl.valuation.methods.banzhaf.MSRBanzhafValuation]. If
58+
you check the code you will see that they are in fact little more than thin
59+
wrappers.
60+
4261

4362
## Uniform sampling
4463

@@ -87,20 +106,20 @@ coefficient
87106

88107
$$ w_{\operatorname{unif}} (S) \equiv 2^{n - 1} . $$
89108

90-
The product $w_{\operatorname{unif}} (S) w_{\operatorname{sh}} (S)$ is the
91-
`log_correction(n, k)` in the code above. Because of how samplers work the
92-
coefficients only depend on the size $k = | S |$ of the subsets, and it will
93-
always be the inverse of the probability of a set $S,$ given that it has size
109+
The product $w_{\operatorname{unif}} (S) w_{\operatorname{sh}} (S)$ is the
110+
`valuation_coefficient(n, k)` in the code above. Because of how samplers work
111+
the coefficients only depend on the size $k = | S |$ of the subsets, and it will
112+
always be the inverse of the probability of a set $S,$ given that it has size
94113
$k.$
95114

96115
At every step of the MC algorithm we do the following:
97116

98117
!!! abstract "Monte Carlo Shapley Update"
99118
1. sample $S_{j} \sim \mathcal{U} (D_{- i}),$ let $k = | S_{j} |$
100-
1. compute the marginal $\Delta_i (S_{j})$
101-
1. compute the product of coefficients for the sampler and the method:
119+
2. compute the marginal $\Delta_i (S_{j})$
120+
3. compute the product of coefficients for the sampler and the method:
102121
$w_{\operatorname{unif}} (k) w_{\operatorname{sh}} (k)$
103-
1. update the running average for $\hat{v}_{\operatorname{unif},
122+
4. update the running average for $\hat{v}_{\operatorname{unif},
104123
\operatorname{sh}}$
105124

106125
## Picking a different distribution
@@ -301,20 +320,21 @@ overriding the property
301320
[log_coefficient][pydvl.valuation.methods.semivalue.SemivalueValuation.log_coefficient]
302321
to return `None`.
303322

304-
Alternatively, we can mix and match samplers, effectively performing importance
305-
sampling. Let $\mathcal{L}$ be the law of a sampling procedure such that
306-
$p_{\mathcal{L}} (S|k) = w_{\operatorname{semi}}$ for some semi-value
307-
coefficient, and let $\mathcal{Q}$ be that of any sampler we choose: Then:
323+
Alternatively, we can mix and match sampler and semi-values, effectively
324+
performing importance sampling. Let $\mathcal{L}$ be the law of a sampling
325+
procedure such that $p_{\mathcal{L}} (S|k) = w_{\operatorname{semi}}$ for some
326+
semi-value coefficient, and let $\mathcal{Q}$ be that of any sampler we choose.
327+
Then:
308328

309329
$$ v_{\operatorname{semi}} (i) = \mathbb{E}_{\mathcal{L}} [\Delta_i (S)]
310330
= \mathbb{E}_{Q} \left[ \frac{w_{\operatorname{semi}} (S)}{p_{Q} (S|k)}
311331
\Delta_i (S) \right] $$
312332

313-
The drawback is that a direct implementation with that much cancelling of
314-
coefficients might be inefficient or numerically unstable. Integration issues
315-
arise to compute $p_{Q} (S|k)$ and so on. On the flip side, we can implement
316-
any sampling method, like antithetic sampling, and immediately benefit in all
317-
semi-value computations.
333+
The drawback is that a direct implementation with that much cancelling of
334+
coefficients might be inefficient or numerically unstable. Integration issues
335+
might arise to compute $p_{Q} (S|k)$ and so on. On the plus side, we can
336+
implement any sampling method, like antithetic sampling, and immediately benefit
337+
in all semi-value computations.
318338

319339
[^1]: At step $(\star)$ we have counted the number of permutations before a
320340
fixed position of index $i$ and after it, because the utility does not depend on

0 commit comments

Comments
 (0)