@@ -119,6 +119,60 @@ necessary:
119119 computation e.g. when the change in estimates is low, or the number of
120120 iterations or time elapsed exceed some threshold.
121121
122+ ### Tensor Support { #tensor-support }
123+
124+ Starting from version 0.10.1, pyDVL supports both NumPy arrays and PyTorch
125+ tensors for data valuation. The implementation follows these key principles:
126+
127+ 1 . ** Type Preservation** : The valuation methods maintain the input data type
128+ throughout computations, whether you provide NumPy arrays or PyTorch tensors
129+ when constructing the [ Dataset] [ pydvl.valuation.dataset.Dataset ] .
130+ 2 . ** Transparent Usage** : The API remains the same regardless of the input type,
131+ simply provide your data as tensors. The main difference is that the torch
132+ model must be wrapped in a class compatible with the protocol
133+ [ TorchSupervisedModel] [ pydvl.valuation.types.TorchSupervisedModel ] .
134+ !!! tip "Wrapping torch models"
135+ There is an example implementation of
136+ [ TorchSupervisedModel] [ pydvl.valuation.types.TorchSupervisedModel ]
137+ in ` notebooks/support/banzhaf.py ` . But you should consider using
138+ [ skorch] ( https://github.com/skorch-dev/skorch ) models instead, which
139+ are entirely compatible with pyDVL.
140+ 3 . ** Consistent Indexing** : Internally, indices are always managed as NumPy
141+ arrays for consistency and compatibility, but the actual data operations
142+ preserve tensor types when provided. In particular, samplers always return
143+ NumPy arrays, and the [ Dataset] [ pydvl.valuation.dataset.Dataset ] class
144+ uses NumPy arrays for indexing.
145+ 4 . [ ValuationResult] [ pydvl.valuation.result.ValuationResult ] objects always
146+ contain NumPy arrays.
147+
148+ ??? example "Creating and using a tensor dataset"
149+ ```python
150+ import torch
151+ from pydvl.valuation.dataset import Dataset
152+ from sklearn.datasets import make_classification
153+ from skorch import NeuralNetClassifier
154+
155+ X, y = make_classification(n_samples=100, n_features=20, n_classes=3)
156+ X_tensor = torch.tensor(X, dtype=torch.float32)
157+ y_tensor = torch.tensor(y, dtype=torch.long)
158+
159+ train, test = Dataset.from_arrays(X_tensor, y_tensor, stratify_by_target=True)
160+ model = NeuralNetClassifier(SomeNNModule(),
161+ max_epochs=10,
162+ criterion=torch.nn.CrossEntropyLoss,
163+ optimizer=torch.optim.Adam)
164+ scorer = SupervisedScorer(model, test, default=0.0, range=(0, 1))
165+ utility = ModelUtility(model, scorer)
166+ valuation = TMCShapleyValuation(utility)
167+ ```
168+
169+ !!! warning "Library-specific requirements"
170+ Some methods that rely on specific libraries may have type requirements:
171+
172+ - Methods that use scikit-learn models directly will convert tensors to
173+ NumPy arrays internally.
174+ - The [KNNShapleyValuation][pydvl.valuation.methods.knn_shapley.KNNShapleyValuation]
175+ method requires NumPy arrays.
122176
123177### Creating a Dataset
124178
@@ -217,6 +271,8 @@ constructor accepts the same types of arguments as those of
217271[ None] [ ] for the default.
218272
219273``` python
274+ import numpy as np
275+ from pydvl.valuation.scorers import SupervisedScorer
220276scorer = SupervisedScorer(" explained_variance" , default = 0.0 , range = (- np.inf, 1 ))
221277```
222278
0 commit comments