Merge pull request #415 from aai-institute/doc/add-glossary-of-terms

mdbenito · web-flow · commit 34693612b66a · 2023-09-01T19:08:12.000+02:00
Doc/add glossary of terms
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -11,6 +11,8 @@ the board, with a focus on documentation and usability.
   [PR #365](https://github.com/aai-institute/pyDVL/pull/365)
 - Enabled parallel computation for Leave-One-Out values
   [PR #406](https://github.com/aai-institute/pyDVL/pull/406)
+- Added more abbreviations to documentation
+  [PR #415](https://github.com/aai-institute/pyDVL/pull/415)
 
 ### Changed
 - Replaced sphinx with mkdocs for documentation. Major overhaul of documentation
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -250,6 +250,16 @@ def f(x: float) -> float:
     return 1/(x*x)
 ```
 
+### Abbreviations
+
+We keep the abbreviations used in the documentation inside the
+[docs_include/abbreviations.md](docs_includes%2Fabbreviations.md) file.
+
+The syntax for abbreviations is:
+
+```markdown
+*[ABBR]: Abbreviation
+```
 
 ## CI
 
diff --git a/docs/value/index.md b/docs/value/index.md
@@ -76,7 +76,7 @@ there are additional desiderata, like having a value function that does not
 increase with repeated samples. Game-theoretic methods are all rooted in axioms
 that by construction ensure different desiderata, but despite their practical
 usefulness, none of them are either necessary or sufficient for all
-applications. For instance, *[SV]s try to equitably distribute all value
+applications. For instance, SV methods try to equitably distribute all value
 among all samples, failing to identify repeated ones as unnecessary, with e.g. a
 zero value.
 
@@ -332,8 +332,7 @@ nature of every (non-trivial) ML problem can have an effect:
   [@wang_data_2022] prove that by relaxing one of the Shapley axioms
   and considering the general class of semi-values, of which Shapley is an
   instance, one can prove that a choice of constant weights is the best one can
-  do in a utility-agnostic setting. So-called *Data Banzhaf* is on our to-do
-  list!
+  do in a utility-agnostic setting. So-called *Data Banzhaf*.
 
 * **Data set size**: Computing exact Shapley values is NP-hard, and Monte Carlo
   approximations can converge slowly. Massive datasets are thus impractical, at
diff --git a/docs/value/semi-values.md b/docs/value/semi-values.md
@@ -117,7 +117,7 @@ values = compute_generic_semivalues(
   u=utility,
   coefficient=beta_coefficient(alpha=1, beta=16),
   done=AbsoluteStandardError(threshold=1e-4),
-  )
+)
 ```
 
 Allowing any coefficient can help when experimenting with models which are more
diff --git a/docs_includes/abbreviations.md b/docs_includes/abbreviations.md
@@ -9,3 +9,7 @@
 *[MSE]: Mean Squared Error
 *[SV]: Shapley Value
 *[TMCS]: Truncated Monte Carlo Shapley
+*[IF]: Influence Function
+*[iHVP]: inverse Hessian-vector product
+*[LiSSA]: Linear-time Stochastic Second-order Algorithm
+*[DUL]: Data Utility Learning