You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Second option `WikiText103Evaluator.get_test_set_path(local_root)` is there if you need path to the files before you get your first instance of WikiText evaluator, for example if you are going to reuse the data for multiple models.
55
+
There is a second option available if you are evaluating multiple models and need to use the same
56
+
dataset multiple times - `WikiText103Evaluator.get_test_set_path(local_root)`. This will get the path before
57
+
you initialize a WikiText evaluator:
58
+
54
59
```python
55
60
from sotabencheval.language_modelling import WikiText103Evaluator
56
61
@@ -72,9 +77,8 @@ evaluator = WikiText103Evaluator(model_name='Model name as found in paperswithco
72
77
If you are reproducing a model from a paper, then you can enter the arXiv ID. If you
then you will enable direct comparison with the paper's model.
76
-
If the `arxiv` is not available you can use `paperswithcode.com` id.
77
-
Below is an example of an evaluator that matches `Transformer XL`:
80
+
then you will enable direct comparison with the paper's model. If the `arxiv_id` is not available you
81
+
can use `paperswithcode.com` id. Below is an example of an evaluator that matches `Transformer XL`:
78
82
79
83
```python
80
84
from sotabencheval.language_modelling import WikiText103Evaluator
@@ -91,18 +95,19 @@ The above will directly compare with the result of the paper when run on the ser
91
95
92
96
## How Do I Evaluate Predictions?
93
97
94
-
The evaluator object has an `.add(log_probs:tensor, targets:tensor)` method to submit predictions by batch or in full.
98
+
The evaluator object has an `.add(log_probs, targets)` method to submit predictions by batch or in full.
95
99
We expect you to give us the log probability of a batch of target tokens and the `target` tokens themselves.
96
100
The `log_probs` can be either:
97
-
- a 0d tensor - summed log probability of all `targets` tokens, or
98
-
- a 2d tensor - log probabilities of each target token, the `log_probs.shape` have to match `targets.shape`
99
-
- a 3d tensor - distribution of log probabilities for each position in the sequence, we will gather the probabilities of target tokens for you.
100
-
It is recommended to use third or second option as it give use a way to check your perplexity calculations.
101
101
102
-
If your model use subword tokenization you don't need convert subwords to full words.
103
-
You are free to report probability of each subwords, we will adjust the perplexity normalization for you, but make sure to set `subword_tokenization=True` in your evaluator.
102
+
- a 0d "tensor" (`np.ndarray`/`torch.tensor`) - summed log probability of all `targets` tokens
103
+
- a 2d "tensor" (`np.ndarray`/`torch.tensor`) - log probabilities of each target token, the `log_probs.shape` should match `targets.shape`
104
+
- a 3d "tensor" (`np.ndarray`/`torch.tensor`) - distribution of log probabilities for each position in the sequence, we will gather the probabilities of target tokens for you.
105
+
106
+
It is recommended to use third or second option as it allows us to check your perplexity calculations.
107
+
108
+
If your model uses subword tokenization you don't need convert subwords to full words. You are free to report probability of each subword: we will adjust the perplexity normalization accordingly. Just make sure to set `subword_tokenization=True` in your evaluator.
104
109
105
-
Here is an example how to report results (for a PyTorch example):
110
+
Here is an example of how to report results (for a PyTorch example):
106
111
107
112
```python
108
113
@@ -175,7 +180,7 @@ multiple models, as it speeds up evaluation significantly.
175
180
176
181
Below we show an implementation for a model from the `huggingface/transformers`. This
177
182
incorporates all the features explained above: (a) using the server data,
178
-
(b) using the WikiText103 Evaluator, and (c) caching the evaluation logic:
183
+
(b) using the WikiText-103 Evaluator, and (c) caching the evaluation logic:
179
184
180
185
```python
181
186
import torch
@@ -210,8 +215,8 @@ evaluator.save()
210
215
evaluator.print_results()
211
216
```
212
217
213
-
You can run this example on google [colab](https://colab.research.google.com/drive/1Qcp1_Fgo_aMtSgf_PV1gFw1DT6hEv7fW).
218
+
You can run this example on [Google Colab](https://colab.research.google.com/drive/1Qcp1_Fgo_aMtSgf_PV1gFw1DT6hEv7fW).
214
219
215
220
## Need More Help?
216
221
217
-
Head on over to the [Natural Language Processing](https://forum.sotabench.com/c/nlp) section of the sotabench forums if you have any questions or difficulties.
222
+
Head on over to the [Natural Language Processing](https://forum.sotabench.com/c/natural-language-processing) section of the sotabench forums if you have any questions or difficulties.
0 commit comments