55 text : Computing Influence Values
66---
77
8- # The influence function
8+ ## The influence function
99
1010!!! Warning
1111 The code in the package [ pydvl.influence] [ pydvl.influence ] is experimental.
@@ -30,7 +30,7 @@ pyDVL implements several methods for the efficient computation of the IF for
3030machine learning. In the examples we document some of the difficulties that can
3131arise when using the IF.
3232
33- # The Influence Function
33+ ## Construction
3434
3535First introduced in the context of robust statistics in [ @hampel_influence_1974] ,
3636the IF was popularized in the context of machine learning in
@@ -74,7 +74,7 @@ up-weighting of samples and perturbation influences. The choice is done by the
7474parameter ` influence_type ` in the main entry point
7575[ compute_influences] [ pydvl.influence.general.compute_influences ] .
7676
77- ## Approximating the influence of a point
77+ ### Approximating the influence of a point
7878
7979Let's define
8080
@@ -125,7 +125,7 @@ All the resulting factors are gradients of the loss wrt. the model parameters
125125$\hat{\theta}$. This can be easily computed through one or more backpropagation
126126passes.
127127
128- ## Perturbation definition of the influence score
128+ ### Perturbation definition of the influence score
129129
130130How would the loss of the model change if, instead of up-weighting an individual
131131point $z$, we were to up-weight only a single feature of that point? Given $z =
@@ -180,11 +180,11 @@ estimate of the impact of a point on the models loss and it is subject to large
180180approximation errors. It can nonetheless be used to build training-set attacks,
181181as done in [ @koh_understanding_2017] .
182182
183- # Computing influences
183+ ## Computation
184184
185185The main entry point of the library for influence calculation is
186- [ compute_influences] [ pydvl.influence.general.compute_influences ] .
187- Given a pre-trained pytorch model with a loss, first an instance of
186+ [ compute_influences] [ pydvl.influence.general.compute_influences ] . Given a
187+ pre-trained pytorch model with a loss, first an instance of
188188[ TorchTwiceDifferentiable] [ pydvl.influence.torch.torch_differentiable.TorchTwiceDifferentiable ]
189189needs to be created:
190190
@@ -217,11 +217,11 @@ tends to improve the performance of the model on test point $i$, and vice versa,
217217a large negative influence indicates that training point $j$ tends to worsen the
218218performance of the model on test point $i$.
219219
220- ## Perturbation influences
220+ ### Perturbation influences
221221
222222The method of empirical influence computation can be selected in
223- [ compute_influences] [ pydvl.influence.general.compute_influences ]
224- with the parameter ` influence_type ` :
223+ [ compute_influences] [ pydvl.influence.general.compute_influences ] with the
224+ parameter ` influence_type ` :
225225
226226``` python
227227from pydvl.influence import compute_influences
@@ -240,7 +240,7 @@ as the number of input features in the data. Therefore, each entry in the tensor
240240represents the influence of each feature of each training point on each test
241241point.
242242
243- ## Approximate matrix inversion
243+ ### Approximate matrix inversion
244244
245245In almost every practical application it is not possible to construct, even less
246246invert the complete Hessian in memory. pyDVL offers several approximate
@@ -259,10 +259,9 @@ compute_influences(
259259
260260Each inversion method has its own set of parameters that can be tuned to improve
261261the final result. These parameters can be passed directly to
262- [ compute_influences] [ pydvl.influence.general.compute_influences ]
263- as keyword arguments. For example, the following code sets
264- the maximum number of iterations for conjugate
265- gradient to $100$ and the minimum relative error to $0.01$:
262+ [ compute_influences] [ pydvl.influence.general.compute_influences ] as keyword
263+ arguments. For example, the following code sets the maximum number of iterations
264+ for conjugate gradient to $100$ and the minimum relative error to $0.01$:
266265
267266``` python
268267from pydvl.influence import compute_influences
@@ -277,25 +276,23 @@ compute_influences(
277276)
278277```
279278
280- ## Hessian regularization
279+ ### Hessian regularization
281280
282281Additionally, and as discussed in [ the introduction] ( #the-influence-function ) ,
283- in machine learning training rarely converges to a
284- global minimum of the loss. Despite good apparent convergence, $\hat{\theta}$
285- might be located in a region with flat curvature or close to a saddle point. In
286- particular, the Hessian might have vanishing eigenvalues making its direct
287- inversion impossible. Certain methods, such as the
288- [ Arnoldi method] ( #arnoldi-solver ) are robust against these problems,
289- but most are not.
290-
291- To circumvent this problem, many approximate methods can be implemented.
292- The simplest adds a small * hessian perturbation term* ,
293- i.e. $H_ {\hat{\theta}} + \lambda \mathbb{I}$,
294- with $\mathbb{I}$ being the identity matrix. This standard trick
295- ensures that the eigenvalues of $H_ {\hat{\theta}}$ are bounded away from zero
296- and therefore the matrix is invertible. In order for this regularization not to
297- corrupt the outcome too much, the parameter $\lambda$ should be as small as
298- possible while still allowing a reliable inversion of $H_ {\hat{\theta}} +
282+ in machine learning training rarely converges to a global minimum of the loss.
283+ Despite good apparent convergence, $\hat{\theta}$ might be located in a region
284+ with flat curvature or close to a saddle point. In particular, the Hessian might
285+ have vanishing eigenvalues making its direct inversion impossible. Certain
286+ methods, such as the [ Arnoldi method] ( #arnoldi-solver ) are robust against these
287+ problems, but most are not.
288+
289+ To circumvent this problem, many approximate methods can be implemented. The
290+ simplest adds a small * hessian perturbation term* , i.e. $H_ {\hat{\theta}} +
291+ \lambda \mathbb{I}$, with $\mathbb{I}$ being the identity matrix. This standard
292+ trick ensures that the eigenvalues of $H_ {\hat{\theta}}$ are bounded away from
293+ zero and therefore the matrix is invertible. In order for this regularization
294+ not to corrupt the outcome too much, the parameter $\lambda$ should be as small
295+ as possible while still allowing a reliable inversion of $H_ {\hat{\theta}} +
299296\lambda \mathbb{I}$.
300297
301298``` python
@@ -309,7 +306,7 @@ compute_influences(
309306)
310307```
311308
312- ## Influence factors
309+ ### Influence factors
313310
314311The [ compute_influences] [ pydvl.influence.general.compute_influences ]
315312method offers a fast way to obtain the influence scores given a model
@@ -340,22 +337,21 @@ The result is an object of type
340337which holds the calculated influence factors (` influence_factors.x ` ) and a
341338dictionary with the info on the inversion process (` influence_factors.info ` ).
342339
343- # Methods for inverse HVP calculation
340+ ## Methods for inverse HVP calculation
344341
345342In order to calculate influence values, pydvl implements several methods for the
346343calculation of the inverse Hessian vector product (iHVP). More precisely, given
347344a model, training data and a tensor $b$, the function
348345[ solve_hvp] [ pydvl.influence.inversion.solve_hvp ]
349- will find $x$ such that $H x = b$,
350- with $H$ is the hessian of model.
346+ will find $x$ such that $H x = b$, with $H$ is the hessian of model.
351347
352- Many different inversion methods can be selected via the parameter
348+ Many different inversion methods can be selected via the parameter
353349` inversion_method ` of
354350[ compute_influences] [ pydvl.influence.general.compute_influences ] .
355351
356352The following subsections will offer more detailed explanations for each method.
357353
358- ## Direct inversion
354+ ### Direct inversion
359355
360356With ` inversion_method = "direct" ` pyDVL will calculate the inverse Hessian
361357using the direct matrix inversion. This means that the Hessian will first be
@@ -382,14 +378,13 @@ The first one is the inverse Hessian vector product, while the second one is a
382378dictionary with the info on the inversion process. For this method, the info
383379consists of the Hessian matrix itself.
384380
385- ## Conjugate Gradient
381+ ### Conjugate Gradient
386382
387- A classical method for solving linear systems of equations is the conjugate
388- gradient method. It is an iterative method that does not require the explicit
389- inversion of the Hessian matrix. Instead, it only requires the calculation of
390- the Hessian vector product. This makes it a good choice for large datasets or
391- models with many parameters. It is Nevertheless much slower than the direct
392- inversion method and not as accurate.
383+ This classical procedure for solving linear systems of equations is an iterative
384+ method that does not require the explicit inversion of the Hessian. Instead, it
385+ only requires the calculation of Hessian-vector products, making it a good
386+ choice for large datasets or models with many parameters. It is nevertheless
387+ much slower to converge than the direct inversion method and not as accurate.
393388More info on the theory of conjugate gradient can be found on
394389[ Wikipedia] ( https://en.wikipedia.org/wiki/Conjugate_gradient_method ) .
395390
@@ -415,18 +410,18 @@ to the [solve_batch_cg][pydvl.influence.torch.torch_differentiable.solve_batch_c
415410function, and are respecively the initial guess for the solution, the relative
416411tolerance, the absolute tolerance, and the maximum number of iterations.
417412
418- The resulting [ InverseHvpResult] [ pydvl.influence.twice_differentiable.InverseHvpResult ]
419- holds the solution of the iHVP, ` influence_factors.x ` , and some info on the
420- inversion process ` influence_factors.info ` . More specifically, for each batch
421- the infos will report the number of iterations, a boolean indicating if the
422- inversion converged, and the residual of the inversion.
413+ The resulting
414+ [ InverseHvpResult] [ pydvl.influence.twice_differentiable.InverseHvpResult ] holds
415+ the solution of the iHVP, ` influence_factors.x ` , and some info on the inversion
416+ process ` influence_factors.info ` . More specifically, for each batch this will
417+ contain the number of iterations, a boolean indicating if the inversion
418+ converged, and the residual of the inversion.
423419
424- ## Linear time Stochastic Second-Order Approximation (LiSSA)
420+ ### Linear time Stochastic Second-Order Approximation (LiSSA)
425421
426422The LiSSA method is a stochastic approximation of the inverse Hessian vector
427423product. Compared to [ conjugate gradient] ( #conjugate-gradient )
428- it is faster but less accurate and typically suffers from
429- instability.
424+ it is faster but less accurate and typically suffers from instability.
430425
431426In order to find the solution of the HVP, LiSSA iteratively approximates the
432427inverse of the Hessian matrix with the following update:
@@ -467,20 +462,22 @@ holds the solution of the iHVP, `influence_factors.x`, and,
467462within ` influence_factors.info ` , the maximum percentage error
468463and the mean percentage error of the approximation.
469464
470- ## Arnoldi solver
465+ ### Arnoldi solver
471466
472- The [ Arnoldi method] ( https://en.wikipedia.org/wiki/Arnoldi_iteration )
473- is a Krylov subspace method for approximating dominating eigenvalues and eigenvectors. Under a low rank
474- assumption on the Hessian at a minimizer (which is typically observed for deep neural networks), this approximation
475- captures the essential action of the Hessian. More concrete, for $Hx=b$ the solution is approximated by
467+ The [ Arnoldi method] ( https://en.wikipedia.org/wiki/Arnoldi_iteration ) is a
468+ Krylov subspace method for approximating dominating eigenvalues and
469+ eigenvectors. Under a low rank assumption on the Hessian at a minimizer (which
470+ is typically observed for deep neural networks), this approximation captures the
471+ essential action of the Hessian. More concretely, for $Hx=b$ the solution is
472+ approximated by
476473
477474\[ x \approx V D^{-1} V^T b\]
478475
479- where \( D\) is a diagonal matrix with the top (in absolute value) eigenvalues of the Hessian
480- and \( V\) contains the corresponding eigenvectors, see also [ @schioppa_scaling_2021 ] .
481-
476+ where \( D\) is a diagonal matrix with the top (in absolute value) eigenvalues of
477+ the Hessian and \( V\) contains the corresponding eigenvectors. See also
478+ [ @schioppa_scaling_2021 ] .
482479
483- In pyDVL, you can select Arnoldi with ` inversion_method = "arnoldi" ` , like this :
480+ In pyDVL, you can use Arnoldi with ` inversion_method = "arnoldi" ` , as follows :
484481
485482``` python
486483from pydvl.influence.inversion import solve_hvp
@@ -495,7 +492,10 @@ solve_hvp(
495492 eigen_computation_on_gpu = False
496493)
497494```
498- For the parameters, check [ solve_arnoldi] [ pydvl.influence.torch.torch_differentiable.solve_arnoldi ] .
499- The resulting [ InverseHvpResult] [ pydvl.influence.twice_differentiable.InverseHvpResult ]
500- holds the solution of the iHVP, ` influence_factors.x ` , and,
501- within ` influence_factors.info ` , the computed eigenvalues and eigenvectors.
495+
496+ For the parameters, check
497+ [ solve_arnoldi] [ pydvl.influence.torch.torch_differentiable.solve_arnoldi ] . The
498+ resulting
499+ [ InverseHvpResult] [ pydvl.influence.twice_differentiable.InverseHvpResult ] holds
500+ the solution of the iHVP, ` influence_factors.x ` , and, within
501+ ` influence_factors.info ` , the computed eigenvalues and eigenvectors.
0 commit comments