Skip to content

Commit 202c30d

Browse files
committed
Formatting of influence.md
1 parent f8fa433 commit 202c30d

File tree

1 file changed

+66
-66
lines changed

1 file changed

+66
-66
lines changed

docs/influence/index.md

Lines changed: 66 additions & 66 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ alias:
55
text: Computing Influence Values
66
---
77

8-
# The influence function
8+
## The influence function
99

1010
!!! Warning
1111
The code in the package [pydvl.influence][pydvl.influence] is experimental.
@@ -30,7 +30,7 @@ pyDVL implements several methods for the efficient computation of the IF for
3030
machine learning. In the examples we document some of the difficulties that can
3131
arise when using the IF.
3232

33-
# The Influence Function
33+
## Construction
3434

3535
First introduced in the context of robust statistics in [@hampel_influence_1974],
3636
the IF was popularized in the context of machine learning in
@@ -74,7 +74,7 @@ up-weighting of samples and perturbation influences. The choice is done by the
7474
parameter `influence_type` in the main entry point
7575
[compute_influences][pydvl.influence.general.compute_influences].
7676

77-
## Approximating the influence of a point
77+
### Approximating the influence of a point
7878

7979
Let's define
8080

@@ -125,7 +125,7 @@ All the resulting factors are gradients of the loss wrt. the model parameters
125125
$\hat{\theta}$. This can be easily computed through one or more backpropagation
126126
passes.
127127

128-
## Perturbation definition of the influence score
128+
### Perturbation definition of the influence score
129129

130130
How would the loss of the model change if, instead of up-weighting an individual
131131
point $z$, we were to up-weight only a single feature of that point? Given $z =
@@ -180,11 +180,11 @@ estimate of the impact of a point on the models loss and it is subject to large
180180
approximation errors. It can nonetheless be used to build training-set attacks,
181181
as done in [@koh_understanding_2017].
182182

183-
# Computing influences
183+
## Computation
184184

185185
The main entry point of the library for influence calculation is
186-
[compute_influences][pydvl.influence.general.compute_influences].
187-
Given a pre-trained pytorch model with a loss, first an instance of
186+
[compute_influences][pydvl.influence.general.compute_influences]. Given a
187+
pre-trained pytorch model with a loss, first an instance of
188188
[TorchTwiceDifferentiable][pydvl.influence.torch.torch_differentiable.TorchTwiceDifferentiable]
189189
needs to be created:
190190

@@ -217,11 +217,11 @@ tends to improve the performance of the model on test point $i$, and vice versa,
217217
a large negative influence indicates that training point $j$ tends to worsen the
218218
performance of the model on test point $i$.
219219

220-
## Perturbation influences
220+
### Perturbation influences
221221

222222
The method of empirical influence computation can be selected in
223-
[compute_influences][pydvl.influence.general.compute_influences]
224-
with the parameter `influence_type`:
223+
[compute_influences][pydvl.influence.general.compute_influences] with the
224+
parameter `influence_type`:
225225

226226
```python
227227
from pydvl.influence import compute_influences
@@ -240,7 +240,7 @@ as the number of input features in the data. Therefore, each entry in the tensor
240240
represents the influence of each feature of each training point on each test
241241
point.
242242

243-
## Approximate matrix inversion
243+
### Approximate matrix inversion
244244

245245
In almost every practical application it is not possible to construct, even less
246246
invert the complete Hessian in memory. pyDVL offers several approximate
@@ -259,10 +259,9 @@ compute_influences(
259259

260260
Each inversion method has its own set of parameters that can be tuned to improve
261261
the final result. These parameters can be passed directly to
262-
[compute_influences][pydvl.influence.general.compute_influences]
263-
as keyword arguments. For example, the following code sets
264-
the maximum number of iterations for conjugate
265-
gradient to $100$ and the minimum relative error to $0.01$:
262+
[compute_influences][pydvl.influence.general.compute_influences] as keyword
263+
arguments. For example, the following code sets the maximum number of iterations
264+
for conjugate gradient to $100$ and the minimum relative error to $0.01$:
266265

267266
```python
268267
from pydvl.influence import compute_influences
@@ -277,25 +276,23 @@ compute_influences(
277276
)
278277
```
279278

280-
## Hessian regularization
279+
### Hessian regularization
281280

282281
Additionally, and as discussed in [the introduction](#the-influence-function),
283-
in machine learning training rarely converges to a
284-
global minimum of the loss. Despite good apparent convergence, $\hat{\theta}$
285-
might be located in a region with flat curvature or close to a saddle point. In
286-
particular, the Hessian might have vanishing eigenvalues making its direct
287-
inversion impossible. Certain methods, such as the
288-
[Arnoldi method](#arnoldi-solver) are robust against these problems,
289-
but most are not.
290-
291-
To circumvent this problem, many approximate methods can be implemented.
292-
The simplest adds a small *hessian perturbation term*,
293-
i.e. $H_{\hat{\theta}} + \lambda \mathbb{I}$,
294-
with $\mathbb{I}$ being the identity matrix. This standard trick
295-
ensures that the eigenvalues of $H_{\hat{\theta}}$ are bounded away from zero
296-
and therefore the matrix is invertible. In order for this regularization not to
297-
corrupt the outcome too much, the parameter $\lambda$ should be as small as
298-
possible while still allowing a reliable inversion of $H_{\hat{\theta}} +
282+
in machine learning training rarely converges to a global minimum of the loss.
283+
Despite good apparent convergence, $\hat{\theta}$ might be located in a region
284+
with flat curvature or close to a saddle point. In particular, the Hessian might
285+
have vanishing eigenvalues making its direct inversion impossible. Certain
286+
methods, such as the [Arnoldi method](#arnoldi-solver) are robust against these
287+
problems, but most are not.
288+
289+
To circumvent this problem, many approximate methods can be implemented. The
290+
simplest adds a small *hessian perturbation term*, i.e. $H_{\hat{\theta}} +
291+
\lambda \mathbb{I}$, with $\mathbb{I}$ being the identity matrix. This standard
292+
trick ensures that the eigenvalues of $H_{\hat{\theta}}$ are bounded away from
293+
zero and therefore the matrix is invertible. In order for this regularization
294+
not to corrupt the outcome too much, the parameter $\lambda$ should be as small
295+
as possible while still allowing a reliable inversion of $H_{\hat{\theta}} +
299296
\lambda \mathbb{I}$.
300297

301298
```python
@@ -309,7 +306,7 @@ compute_influences(
309306
)
310307
```
311308

312-
## Influence factors
309+
### Influence factors
313310

314311
The [compute_influences][pydvl.influence.general.compute_influences]
315312
method offers a fast way to obtain the influence scores given a model
@@ -340,22 +337,21 @@ The result is an object of type
340337
which holds the calculated influence factors (`influence_factors.x`) and a
341338
dictionary with the info on the inversion process (`influence_factors.info`).
342339

343-
# Methods for inverse HVP calculation
340+
## Methods for inverse HVP calculation
344341

345342
In order to calculate influence values, pydvl implements several methods for the
346343
calculation of the inverse Hessian vector product (iHVP). More precisely, given
347344
a model, training data and a tensor $b$, the function
348345
[solve_hvp][pydvl.influence.inversion.solve_hvp]
349-
will find $x$ such that $H x = b$,
350-
with $H$ is the hessian of model.
346+
will find $x$ such that $H x = b$, with $H$ is the hessian of model.
351347

352-
Many different inversion methods can be selected via the parameter
348+
Many different inversion methods can be selected via the parameter
353349
`inversion_method` of
354350
[compute_influences][pydvl.influence.general.compute_influences].
355351

356352
The following subsections will offer more detailed explanations for each method.
357353

358-
## Direct inversion
354+
### Direct inversion
359355

360356
With `inversion_method = "direct"` pyDVL will calculate the inverse Hessian
361357
using the direct matrix inversion. This means that the Hessian will first be
@@ -382,14 +378,13 @@ The first one is the inverse Hessian vector product, while the second one is a
382378
dictionary with the info on the inversion process. For this method, the info
383379
consists of the Hessian matrix itself.
384380

385-
## Conjugate Gradient
381+
### Conjugate Gradient
386382

387-
A classical method for solving linear systems of equations is the conjugate
388-
gradient method. It is an iterative method that does not require the explicit
389-
inversion of the Hessian matrix. Instead, it only requires the calculation of
390-
the Hessian vector product. This makes it a good choice for large datasets or
391-
models with many parameters. It is Nevertheless much slower than the direct
392-
inversion method and not as accurate.
383+
This classical procedure for solving linear systems of equations is an iterative
384+
method that does not require the explicit inversion of the Hessian. Instead, it
385+
only requires the calculation of Hessian-vector products, making it a good
386+
choice for large datasets or models with many parameters. It is nevertheless
387+
much slower to converge than the direct inversion method and not as accurate.
393388
More info on the theory of conjugate gradient can be found on
394389
[Wikipedia](https://en.wikipedia.org/wiki/Conjugate_gradient_method).
395390

@@ -415,18 +410,18 @@ to the [solve_batch_cg][pydvl.influence.torch.torch_differentiable.solve_batch_c
415410
function, and are respecively the initial guess for the solution, the relative
416411
tolerance, the absolute tolerance, and the maximum number of iterations.
417412

418-
The resulting [InverseHvpResult][pydvl.influence.twice_differentiable.InverseHvpResult]
419-
holds the solution of the iHVP, `influence_factors.x`, and some info on the
420-
inversion process `influence_factors.info`. More specifically, for each batch
421-
the infos will report the number of iterations, a boolean indicating if the
422-
inversion converged, and the residual of the inversion.
413+
The resulting
414+
[InverseHvpResult][pydvl.influence.twice_differentiable.InverseHvpResult] holds
415+
the solution of the iHVP, `influence_factors.x`, and some info on the inversion
416+
process `influence_factors.info`. More specifically, for each batch this will
417+
contain the number of iterations, a boolean indicating if the inversion
418+
converged, and the residual of the inversion.
423419

424-
## Linear time Stochastic Second-Order Approximation (LiSSA)
420+
### Linear time Stochastic Second-Order Approximation (LiSSA)
425421

426422
The LiSSA method is a stochastic approximation of the inverse Hessian vector
427423
product. Compared to [conjugate gradient](#conjugate-gradient)
428-
it is faster but less accurate and typically suffers from
429-
instability.
424+
it is faster but less accurate and typically suffers from instability.
430425

431426
In order to find the solution of the HVP, LiSSA iteratively approximates the
432427
inverse of the Hessian matrix with the following update:
@@ -467,20 +462,22 @@ holds the solution of the iHVP, `influence_factors.x`, and,
467462
within `influence_factors.info`, the maximum percentage error
468463
and the mean percentage error of the approximation.
469464

470-
## Arnoldi solver
465+
### Arnoldi solver
471466

472-
The [Arnoldi method](https://en.wikipedia.org/wiki/Arnoldi_iteration)
473-
is a Krylov subspace method for approximating dominating eigenvalues and eigenvectors. Under a low rank
474-
assumption on the Hessian at a minimizer (which is typically observed for deep neural networks), this approximation
475-
captures the essential action of the Hessian. More concrete, for $Hx=b$ the solution is approximated by
467+
The [Arnoldi method](https://en.wikipedia.org/wiki/Arnoldi_iteration) is a
468+
Krylov subspace method for approximating dominating eigenvalues and
469+
eigenvectors. Under a low rank assumption on the Hessian at a minimizer (which
470+
is typically observed for deep neural networks), this approximation captures the
471+
essential action of the Hessian. More concretely, for $Hx=b$ the solution is
472+
approximated by
476473

477474
\[x \approx V D^{-1} V^T b\]
478475

479-
where \(D\) is a diagonal matrix with the top (in absolute value) eigenvalues of the Hessian
480-
and \(V\) contains the corresponding eigenvectors, see also [@schioppa_scaling_2021].
481-
476+
where \(D\) is a diagonal matrix with the top (in absolute value) eigenvalues of
477+
the Hessian and \(V\) contains the corresponding eigenvectors. See also
478+
[@schioppa_scaling_2021].
482479

483-
In pyDVL, you can select Arnoldi with `inversion_method = "arnoldi"`, like this:
480+
In pyDVL, you can use Arnoldi with `inversion_method = "arnoldi"`, as follows:
484481

485482
```python
486483
from pydvl.influence.inversion import solve_hvp
@@ -495,7 +492,10 @@ solve_hvp(
495492
eigen_computation_on_gpu=False
496493
)
497494
```
498-
For the parameters, check [solve_arnoldi][pydvl.influence.torch.torch_differentiable.solve_arnoldi].
499-
The resulting [InverseHvpResult][pydvl.influence.twice_differentiable.InverseHvpResult]
500-
holds the solution of the iHVP, `influence_factors.x`, and,
501-
within `influence_factors.info`, the computed eigenvalues and eigenvectors.
495+
496+
For the parameters, check
497+
[solve_arnoldi][pydvl.influence.torch.torch_differentiable.solve_arnoldi]. The
498+
resulting
499+
[InverseHvpResult][pydvl.influence.twice_differentiable.InverseHvpResult] holds
500+
the solution of the iHVP, `influence_factors.x`, and, within
501+
`influence_factors.info`, the computed eigenvalues and eigenvectors.

0 commit comments

Comments
 (0)