Skip to content

Commit 320a579

Browse files
authored
Merge pull request #650 from aai-institute/fix/dul_notebook
Improvements to DUL and associated notebook
2 parents ac481bc + ec264d6 commit 320a579

36 files changed

+2608
-853
lines changed

.github/workflows/run-notebook-tests-workflow.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,11 @@ jobs:
2323
run-tests:
2424
runs-on: ubuntu-22.04
2525
steps:
26+
- name: Free Disk Space (Ubuntu)
27+
uses: jlumbroso/free-disk-space@main
28+
with:
29+
large-packages: false
30+
docker-images: false
2631
- uses: actions/checkout@v4
2732
with:
2833
fetch-depth: 0

CHANGELOG.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,9 @@
55

66
### Added
77

8+
- Introduced `UtilityModel` and two implementations `IndicatorUtilityModel`
9+
and `DeepSetsUtilityModel` for data utility learning
10+
[PR #650](https://github.com/aai-institute/pyDVL/pull/650)
811
- Introduced the concept of `ResultUpdater` in order to allow samplers to
912
declare the proper strategy to use by valuations
1013
[PR #641](https://github.com/aai-institute/pyDVL/pull/641)
@@ -56,8 +59,9 @@
5659
- Fixed `show_warnings=False` not being respected in subprocesses
5760
[PR #647](https://github.com/aai-institute/pyDVL/pull/647)
5861
- Fixed several bugs in diverse stopping criteria, including: iteration counts,
59-
computing completion and resetting
62+
computing completion, resetting, nested composition
6063
[PR #641](https://github.com/aai-institute/pyDVL/pull/641)
64+
[PR #650](https://github.com/aai-institute/pyDVL/pull/650)
6165
- Fixed all weights of all samplers to ensure that mix-and-matching samplers and
6266
semi-value methods always works, for all possible combinations
6367
[PR #641](https://github.com/aai-institute/pyDVL/pull/641)
@@ -88,6 +92,8 @@
8892
[PR #641](https://github.com/aai-institute/pyDVL/pull/641)
8993
- Updated Shapley spotify notebook
9094
[PR #628](https://github.com/aai-institute/pyDVL/pull/628)
95+
- Updated Data Utility notebook
96+
[PR #650](https://github.com/aai-institute/pyDVL/pull/650)
9197
- Restructured and generalized `StratifiedSampler` to allow using heuristics,
9298
thus subsuming Variance-Reduced stratified sampling into a unified framework.
9399
Implemented the heuristics proposed in that paper

notebooks/data_oob.ipynb

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -42,8 +42,6 @@
4242
"metadata": {},
4343
"outputs": [],
4444
"source": [
45-
"%load_ext autoreload\n",
46-
"\n",
4745
"import os\n",
4846
"import random\n",
4947
"\n",

notebooks/influence_imagenet.ipynb

Lines changed: 0 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -41,24 +41,6 @@
4141
"## Imports and setup"
4242
]
4343
},
44-
{
45-
"cell_type": "code",
46-
"execution_count": 1,
47-
"metadata": {
48-
"editable": true,
49-
"nbsphinx": "hidden",
50-
"slideshow": {
51-
"slide_type": ""
52-
},
53-
"tags": [
54-
"hide"
55-
]
56-
},
57-
"outputs": [],
58-
"source": [
59-
"%load_ext autoreload"
60-
]
61-
},
6244
{
6345
"cell_type": "code",
6446
"execution_count": 2,
@@ -74,7 +56,6 @@
7456
},
7557
"outputs": [],
7658
"source": [
77-
"%autoreload\n",
7859
"%matplotlib inline\n",
7960
"import logging\n",
8061
"import os\n",

notebooks/influence_sentiment_analysis.ipynb

Lines changed: 4 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -45,19 +45,6 @@
4545
"## Setup"
4646
]
4747
},
48-
{
49-
"cell_type": "code",
50-
"execution_count": 1,
51-
"metadata": {
52-
"tags": [
53-
"hide-input"
54-
]
55-
},
56-
"outputs": [],
57-
"source": [
58-
"%load_ext autoreload"
59-
]
60-
},
6148
{
6249
"cell_type": "markdown",
6350
"metadata": {},
@@ -67,22 +54,13 @@
6754
},
6855
{
6956
"cell_type": "code",
70-
"execution_count": 2,
57+
"execution_count": null,
7158
"metadata": {
7259
"tags": [
7360
"hide-output"
7461
]
7562
},
76-
"outputs": [
77-
{
78-
"name": "stderr",
79-
"output_type": "stream",
80-
"text": [
81-
"/home/jakob/Documents/pyDVL/src/pydvl/parallel/config.py:31: FutureWarning: The `ParallelConfig` class was deprecated in v0.9.0 and will be removed in v0.10.0\n",
82-
" warnings.warn(\n"
83-
]
84-
}
85-
],
63+
"outputs": [],
8664
"source": [
8765
"import os\n",
8866
"from copy import deepcopy\n",
@@ -667,18 +645,9 @@
667645
},
668646
{
669647
"cell_type": "code",
670-
"execution_count": 21,
648+
"execution_count": null,
671649
"metadata": {},
672-
"outputs": [
673-
{
674-
"name": "stderr",
675-
"output_type": "stream",
676-
"text": [
677-
"/home/jakob/Documents/pyDVL/venv/lib/python3.10/site-packages/transformers/models/distilbert/modeling_distilbert.py:222: UserWarning: There is a performance drop because we have not yet implemented the batching rule for aten::masked_fill.Tensor. Please file us an issue on GitHub so that we can prioritize its implementation. (Triggered internally at ../aten/src/ATen/functorch/BatchedFallback.cpp:82.)\n",
678-
" scores = scores.masked_fill(\n"
679-
]
680-
}
681-
],
650+
"outputs": [],
682651
"source": [
683652
"ekfac_train_influences = ekfac_influence_model.influences(\n",
684653
" test_input,\n",

notebooks/influence_synthetic.ipynb

Lines changed: 0 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -60,24 +60,6 @@
6060
"## Imports"
6161
]
6262
},
63-
{
64-
"cell_type": "code",
65-
"execution_count": 1,
66-
"id": "ea082430",
67-
"metadata": {
68-
"editable": true,
69-
"slideshow": {
70-
"slide_type": ""
71-
},
72-
"tags": [
73-
"hide-input"
74-
]
75-
},
76-
"outputs": [],
77-
"source": [
78-
"%load_ext autoreload"
79-
]
80-
},
8163
{
8264
"cell_type": "code",
8365
"execution_count": 2,
@@ -91,7 +73,6 @@
9173
},
9274
"outputs": [],
9375
"source": [
94-
"%autoreload\n",
9576
"%matplotlib inline\n",
9677
"\n",
9778
"import os\n",

notebooks/influence_wine.ipynb

Lines changed: 0 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -40,23 +40,6 @@
4040
"## Imports"
4141
]
4242
},
43-
{
44-
"cell_type": "code",
45-
"execution_count": 1,
46-
"id": "cef17bfc",
47-
"metadata": {
48-
"slideshow": {
49-
"slide_type": ""
50-
},
51-
"tags": [
52-
"hide-input"
53-
]
54-
},
55-
"outputs": [],
56-
"source": [
57-
"%load_ext autoreload"
58-
]
59-
},
6043
{
6144
"cell_type": "code",
6245
"execution_count": 2,
@@ -71,7 +54,6 @@
7154
},
7255
"outputs": [],
7356
"source": [
74-
"%autoreload\n",
7557
"%matplotlib inline\n",
7658
"\n",
7759
"import os\n",

0 commit comments

Comments
 (0)