Skip to content

Commit 372a341

Browse files
authored
Merge pull request #238 from appliedAI-Initiative/fix/cleanup
Some docs and cleanup
2 parents 633621e + bfa3c71 commit 372a341

File tree

17 files changed

+155
-152
lines changed

17 files changed

+155
-152
lines changed

CONTRIBUTING.md

Lines changed: 24 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -183,19 +183,36 @@ any rst files which are not manually created), you can use a file watcher.
183183
This is not part of the development setup of pyDVL (yet! PRs welcome), but
184184
modern IDEs provide functionality for this.
185185

186-
Use the **docs** tox environment to build the documentation the same way it is done in CI:
186+
Use the **docs** tox environment to build the documentation the same way it is
187+
done in CI:
187188

188189
```bash
189190
tox -e docs
190191
```
191192

192-
Locally, you can use the **docs-dev** tox environment to continuously rebuild docs on changes:
193+
Locally, you can use the **docs-dev** tox environment to continuously rebuild
194+
documentation on changes to the `docs` folder:
193195

194196
```bash
195197
tox -e docs-dev
196198
```
197199

198-
**NOTE:** This currently only rebuilds on changes to `.rst` files and notebooks.
200+
**Again:** this only rebuilds on changes to `.rst` files and notebooks inside
201+
`docs`.
202+
203+
### Using bibliography
204+
205+
Bibliographic citations are managed with the plugin
206+
[sphinx-bibtex](https://sphinxcontrib-bibtex.readthedocs.io/en/latest/index.html).
207+
To enter a citation first add the entry to `docs/pydvl.bib`. For team
208+
contributor this should be an export of the Zotero folder `software/pydvl` in
209+
the [TransferLab Zotero library](https://www.zotero.org/groups/2703043/transferlab/library).
210+
All other contributors just add the bibtex data, and a maintainer will add it to
211+
the group library upon merging.
212+
213+
To add a citation inside a module or function's docstring, use the sphinx role
214+
`:footcite:t:`. A references section is automatically added at the bottom of
215+
each module's auto-generated documentation.
199216

200217
### Writing mathematics
201218

@@ -269,7 +286,8 @@ satisfied:
269286

270287
Then, a new release can be created using the script
271288
`build_scripts/release-version.sh` (leave out the version parameter to have
272-
`bumpversion` automatically derive the next release version by bumping the patch part):
289+
`bumpversion` automatically derive the next release version by bumping the patch
290+
part):
273291

274292
```shell script
275293
./scripts/release-version.sh 0.1.6
@@ -285,7 +303,8 @@ If running in interactive mode (without `-y|--yes`), the script will output a
285303
summary of pending changes and ask for confirmation before executing the
286304
actions.
287305

288-
Once this is done, a package will be automatically created and published from CI to PyPI.
306+
Once this is done, a package will be automatically created and published from CI
307+
to PyPI.
289308

290309
### Manual release process
291310

build_scripts/update_docs.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,14 @@ def module_template(module_qualname: str):
2222
.. automodule:: {module_qualname}
2323
:members:
2424
:undoc-members:
25+
26+
----
27+
28+
Module members
29+
==============
30+
31+
.. footbibliography::
32+
2533
"""
2634
return template
2735

docs/10-getting-started.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,12 +14,12 @@ algorithms for data valuation and influence functions. You can read:
1414

1515
* :ref:`data valuation` for key objects and usage patterns for Shapley value
1616
computation and related methods.
17-
* :ref:`influence` for instruction on how to compute influence functions (still
17+
* :ref:`influence` for instructions on how to compute influence functions (still
1818
in a pre-alpha state)
1919

2020
We only briefly introduce key concepts in the documentation. For a thorough
2121
introduction and survey of the field, we refer to **the upcoming review** at the
22-
:tfl:`TransferLab website <>`.
22+
:tfl:`TransferLab website <reviews/data-valuation>`.
2323

2424
Running the examples
2525
====================

docs/20-install.rst

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -10,13 +10,7 @@ To install the latest release use:
1010
1111
pip install pyDVL
1212
13-
You can also install the latest development version from `TestPyPI <https://test.pypi.org/project/pyDVL/>`_:
14-
15-
.. code-block:: shell
16-
17-
pip install pyDVL --index-url https://test.pypi.org/simple/
18-
19-
To use all features of influence functions execute:
13+
To use all features of influence functions use instead:
2014

2115
.. code-block:: shell
2216
@@ -29,7 +23,14 @@ In order to check the installation you can use:
2923

3024
.. code-block:: shell
3125
32-
python -c "import valuation; print(pydvl.__version__)"
26+
python -c "import pydvl; print(pydvl.__version__)"
27+
28+
You can also install the latest development version from
29+
`TestPyPI <https://test.pypi.org/project/pyDVL/>`_:
30+
31+
.. code-block:: shell
32+
33+
pip install pyDVL --index-url https://test.pypi.org/simple/
3334
3435
Dependencies
3536
============

docs/30-data-valuation.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -460,7 +460,7 @@ Because the number of subsets $S \subseteq D \setminus \{x_i\}$ is
460460
$2^{ | D | - 1 }$, one typically must resort to approximations.
461461

462462
The simplest approximation consists of two relaxations of the Least Core
463-
(:footcite:t:`yan_procaccia_2021`):
463+
(:footcite:t:`yan_if_2021`):
464464

465465
- Further relaxing the coalitional rationality property by
466466
a constant value $\epsilon > 0$:

docs/conf.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,8 @@
7070
}
7171

7272
bibtex_bibfiles = ["pydvl.bib"]
73+
bibtex_bibliography_header = "References\n=========="
74+
bibtex_footbibliography_header = bibtex_bibliography_header
7375

7476
# NBSphinx
7577

docs/pydvl.bib

Lines changed: 15 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ @inproceedings{ghorbani_data_2019
1212
issn = {2640-3498},
1313
url = {http://proceedings.mlr.press/v97/ghorbani19c.html},
1414
urldate = {2020-11-01},
15-
abstract = {As data becomes the fuel driving technological and economic growth, a fundamental challenge is how to quantify the value of data in algorithmic predictions and decisions. For example, in healthcare...},
15+
abstract = {As data becomes the fuel driving technological and economic growth, a fundamental challenge is how to quantify the value of data in algorithmic predictions and decisions. For example, in healthcare and consumer markets, it has been suggested that individuals should be compensated for the data that they generate, but it is not clear what is an equitable valuation for individual data. In this work, we develop a principled framework to address data valuation in the context of supervised machine learning. Given a learning algorithm trained on n data points to produce a predictor, we propose data Shapley as a metric to quantify the value of each training datum to the predictor performance. Data Shapley uniquely satisfies several natural properties of equitable data valuation. We develop Monte Carlo and gradient-based methods to efficiently estimate data Shapley values in practical settings where complex learning algorithms, including neural networks, are trained on large datasets. In addition to being equitable, extensive experiments across biomedical, image and synthetic data demonstrate that data Shapley has several other benefits: 1) it is more powerful than the popular leave-one-out or leverage score in providing insight on what data is more valuable for a given learning task; 2) low Shapley value data effectively capture outliers and corruptions; 3) high Shapley value data inform what type of new data to acquire to improve the predictor.},
1616
archiveprefix = {arXiv},
1717
langid = {english}
1818
}
@@ -122,16 +122,20 @@ @inproceedings{wang_improving_2022
122122
langid = {english}
123123
}
124124

125-
@article{yan_procaccia_2021,
126-
title = {If You Like Shapley Then You’ll Love the Core},
127-
volume = {35},
128-
url = {https://ojs.aaai.org/index.php/AAAI/article/view/16721},
129-
doi = {10.1609/aaai.v35i6.16721},
130-
abstract = {The prevalent approach to problems of credit assignment in machine learning -- such as feature and data valuation -- is to model the problem at hand as a cooperative game and apply the Shapley value. But cooperative game theory offers a rich menu of alternative solution concepts, which famously includes the core and its variants. Our goal is to challenge the machine learning community’s current consensus around the Shapley value, and make a case for the core as a viable alternative. To that end, we prove that arbitrarily good approximations to the least core -- a core relaxation that is always feasible -- can be computed efficiently (but prove an impossibility for a more refined solution concept, the nucleolus). We also perform experiments that corroborate these theoretical results and shed light on settings where the least core may be preferable to the Shapley value.},
131-
number = {6},
132-
journal = {Proceedings of the AAAI Conference on Artificial Intelligence},
125+
@inproceedings{yan_if_2021,
126+
title = {If {{You Like Shapley Then You}}'ll {{Love}} the {{Core}}},
127+
booktitle = {Proceedings of the 35th {{AAAI Conference}} on {{Artificial Intelligence}}, 2021},
133128
author = {Yan, Tom and Procaccia, Ariel D.},
134129
year = {2021},
135-
month = {May},
136-
pages = {5751-5759}
130+
month = may,
131+
volume = {6},
132+
pages = {5751--5759},
133+
publisher = {{Association for the Advancement of Artificial Intelligence}},
134+
address = {{Virtual conference}},
135+
doi = {10.1609/aaai.v35i6.16721},
136+
url = {https://ojs.aaai.org/index.php/AAAI/article/view/16721},
137+
urldate = {2021-04-23},
138+
abstract = {The prevalent approach to problems of credit assignment in machine learning \textemdash{} such as feature and data valuation\textemdash{} is to model the problem at hand as a cooperative game and apply the Shapley value. But cooperative game theory offers a rich menu of alternative solution concepts, which famously includes the core and its variants. Our goal is to challenge the machine learning community's current consensus around the Shapley value, and make a case for the core as a viable alternative. To that end, we prove that arbitrarily good approximations to the least core \textemdash{} a core relaxation that is always feasible \textemdash{} can be computed efficiently (but prove an impossibility for a more refined solution concept, the nucleolus). We also perform experiments that corroborate these theoretical results and shed light on settings where the least core may be preferable to the Shapley value.},
139+
copyright = {Copyright (c) 2021, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.},
140+
langid = {english}
137141
}

src/pydvl/utils/utility.py

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -217,10 +217,6 @@ class DataUtilityLearning:
217217
>>> wrapped_u((1, 2, 3)) # Subsequent calls will be computed using the fit model for DUL
218218
0.0
219219
220-
.. rubric:: References
221-
222-
.. footbibliography::
223-
224220
"""
225221

226222
def __init__(

src/pydvl/value/banzhaf/__init__.py

Lines changed: 0 additions & 6 deletions
This file was deleted.

src/pydvl/value/banzhaf/montecarlo.py

Whitespace-only changes.

0 commit comments

Comments
 (0)