aai-institute
diff --git a/‎CHANGELOG.md‎
Lines changed: 2 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 2 additions & 1 deletion b/‎README.md‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎docs/assets/pydvl.bib‎
Lines changed: 17 additions & 0 deletions b/‎docs/assets/pydvl.bib‎
Lines changed: 17 additions & 0 deletions
diff --git a/‎docs/influence/influence_function_model.md‎
Lines changed: 30 additions & 2 deletions b/‎docs/influence/influence_function_model.md‎
Lines changed: 30 additions & 2 deletions
diff --git a/‎notebooks/influence_wine.ipynb‎
Lines changed: 308 additions & 78 deletions b/‎notebooks/influence_wine.ipynb‎
Lines changed: 308 additions & 78 deletions
diff --git a/‎notebooks/support/torch.py‎
Lines changed: 0 additions & 2 deletions b/‎notebooks/support/torch.py‎
Lines changed: 0 additions & 2 deletions
diff --git a/‎src/pydvl/influence/base_influence_function_model.py‎
Lines changed: 6 additions & 0 deletions b/‎src/pydvl/influence/base_influence_function_model.py‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎src/pydvl/influence/torch/__init__.py‎
Lines changed: 1 addition & 0 deletions b/‎src/pydvl/influence/torch/__init__.py‎
Lines changed: 1 addition & 0 deletions
@@ -8,6 +8,8 @@
   for single dimensional arrays [PR #485](https://github.com/aai-institute/pyDVL/pull/485)
 - Fix implementations of `to` methods of `TorchInfluenceFunctionModel` implementations
   [PR #487](https://github.com/aai-institute/pyDVL/pull/487)
+- Implement new method: `EkfacInfluence`
+  [PR #451](https://github.com/aai-institute/pyDVL/issues/451)
 
 ## 0.8.0 - 🆕 New interfaces, scaling computation, bug fixes and improvements 🎁
 
 
@@ -318,7 +318,8 @@ We currently implement the following papers:
 - Schioppa, Andrea, Polina Zablotskaia, David Vilar, and Artem Sokolov. 
   [Scaling Up Influence Functions](http://arxiv.org/abs/2112.03052). 
   In Proceedings of the AAAI-22. arXiv, 2021.
-
+- James Martens, Roger Grosse, [Optimizing Neural Networks with Kronecker-factored Approximate Curvature](https://arxiv.org/abs/1503.05671), International Conference on Machine Learning (ICML), 2015.
+- George, Thomas, César Laurent, Xavier Bouthillier, Nicolas Ballas, Pascal Vincent, [Fast Approximate Natural Gradient Descent in a Kronecker-factored Eigenbasis](https://arxiv.org/abs/1806.03884), Advances in Neural Information Processing Systems 31,2018.
 
 # License
 
 
@@ -342,4 +342,21 @@ @InProceedings{kwon_data_2023
   pdf = 	 {https://proceedings.mlr.press/v202/kwon23e/kwon23e.pdf},
   url = 	 {https://proceedings.mlr.press/v202/kwon23e.html},
   abstract = 	 {Data valuation is a powerful framework for providing statistical insights into which data are beneficial or detrimental to model training. Many Shapley-based data valuation methods have shown promising results in various downstream tasks, however, they are well known to be computationally challenging as it requires training a large number of models. As a result, it has been recognized as infeasible to apply to large datasets. To address this issue, we propose Data-OOB, a new data valuation method for a bagging model that utilizes the out-of-bag estimate. The proposed method is computationally efficient and can scale to millions of data by reusing trained weak learners. Specifically, Data-OOB takes less than $2.25$ hours on a single CPU processor when there are $10^6$ samples to evaluate and the input dimension is $100$. Furthermore, Data-OOB has solid theoretical interpretations in that it identifies the same important data point as the infinitesimal jackknife influence function when two different points are compared. We conduct comprehensive experiments using 12 classification datasets, each with thousands of sample sizes. We demonstrate that the proposed method significantly outperforms existing state-of-the-art data valuation methods in identifying mislabeled data and finding a set of helpful (or harmful) data points, highlighting the potential for applying data values in real-world applications.}
+}
+
+@article{george2018fast,
+  title={Fast approximate natural gradient descent in a kronecker factored eigenbasis},
+  author={George, Thomas and Laurent, C{\'e}sar and Bouthillier, Xavier and Ballas, Nicolas and Vincent, Pascal},
+  journal={Advances in Neural Information Processing Systems},
+  volume={31},
+  year={2018}
+}
+
+@inproceedings{martens2015optimizing,
+  title={Optimizing neural networks with kronecker-factored approximate curvature},
+  author={Martens, James and Grosse, Roger},
+  booktitle={International conference on machine learning},
+  pages={2408--2417},
+  year={2015},
+  organization={PMLR}
 }
@@ -87,7 +87,7 @@ the Hessian and \(V\) contains the corresponding eigenvectors. See also
 
 ```python
 from pydvl.influence.torch import ArnoldiInfluence
-if_model = ArnoldiInfluence
+if_model = ArnoldiInfluence(
     model,
     loss,
     hessian_regularization=0.0,
@@ -97,4 +97,32 @@ if_model = ArnoldiInfluence
 ```
 These implementations represent the calculation logic on in memory tensors. To scale up to large collection
 of data, we map these influence function models over these collections. For a detailed discussion see the
-documentation page [Scaling Computation](scaling_computation.md).
+documentation page [Scaling Computation](scaling_computation.md).
+
+### Eigenvalue Corrected K-FAC
+
+K-FAC, short for Kronecker-Factored Approximate Curvature, is a method that approximates the Fisher information matrix [FIM](https://en.wikipedia.org/wiki/Fisher_information) of a model. It is possible to show that for classification models with appropriate loss functions the FIM is equal to the Hessian of the model’s loss over the dataset. In this restricted but nonetheless important context K-FAC offers an efficient way to approximate the Hessian and hence the influence scores. 
+For more info and details refer to the original paper [@martens2015optimizing].
+
+The K-FAC method is implemented in the class [EkfacInfluence](pydvl/influence/torch/influence_function_model.py). The following code snippet shows how to use the K-FAC method to calculate the influence function of a model. Note that, in contrast to the other methods for influence function calculation, K-FAC does not require the loss function as an input. This is because the current implementation is only applicable to classification models with a cross entropy loss function. 
+
+```python
+from pydvl.influence.torch import EkfacInfluence
+if_model = EkfacInfluence(
+    model,
+    hessian_regularization=0.0,
+)
+```
+Upon initialization, the K-FAC method will parse the model and extract which layers require grad and which do not. Then it will only calculate the influence scores for the layers that require grad. The current implementation of the K-FAC method is only available for linear layers, and therefore if the model contains non-linear layers that require gradient the K-FAC method will raise a NotImplementedLayerRepresentationException.
+
+A further improvement of the K-FAC method is the Eigenvalue Corrected K-FAC (EKFAC) method [@george2018fast], which allows to further re-fit the eigenvalues of the Hessian, thus providing a more accurate approximation. On top of the K-FAC method, the EKFAC method is implemented by setting `update_diagonal=True` when initialising [EkfacInfluence](pydvl/influence/torch/influence_function_model.py). The following code snippet shows how to use the EKFAC method to calculate the influence function of a model. 
+
+```python
+from pydvl.influence.torch import EkfacInfluence
+if_model = EkfacInfluence(
+    model,
+    update_diagonal=True,
+    hessian_regularization=0.0,
+)
+if_model.fit(train_loader)
+```
@@ -74,8 +74,6 @@ def __init__(
             layers.append(nn.Tanh())
         layers.pop()
 
-        layers.append(nn.Softmax(dim=-1))
-
         self.layers = nn.Sequential(*layers)
 
     def forward(self, x: torch.Tensor) -> torch.Tensor:
 
@@ -36,6 +36,12 @@ def __init__(self):
         )
 
 
+class NotImplementedLayerRepresentationException(ValueError):
+    def __init__(self, module_id: str):
+        message = f"Only Linear layers are supported, but found module {module_id} requiring grad."
+        super().__init__(message)
+
+
 """Type variable for tensors, i.e. sequences of numbers"""
 TensorType = TypeVar("TensorType", bound=Collection)
 DataLoaderType = TypeVar("DataLoaderType", bound=Iterable)
 
@@ -2,5 +2,6 @@
     ArnoldiInfluence,
     CgInfluence,
     DirectInfluence,
+    EkfacInfluence,
     LissaInfluence,
 )
Original file line number	Diff line number	Diff line change
`@@ -2,5 +2,6 @@`
`2`	`2`	`ArnoldiInfluence,`
`3`	`3`	`CgInfluence,`
`4`	`4`	`DirectInfluence,`
	`5`	`+ EkfacInfluence,`
`5`	`6`	`LissaInfluence,`
`6`	`7`	`)`