Skip to content

Commit 40f93a0

Browse files
FramartinBernardZach
authored andcommitted
Fix perplexity computation in perplexity.md (huggingface#34387)
fix average NLL in perplexity.md
1 parent dca0104 commit 40f93a0

File tree

1 file changed

+11
-4
lines changed

1 file changed

+11
-4
lines changed

docs/source/en/perplexity.md

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -107,7 +107,8 @@ max_length = model.config.n_positions
107107
stride = 512
108108
seq_len = encodings.input_ids.size(1)
109109

110-
nlls = []
110+
nll_sum = 0.0
111+
n_tokens = 0
111112
prev_end_loc = 0
112113
for begin_loc in tqdm(range(0, seq_len, stride)):
113114
end_loc = min(begin_loc + max_length, seq_len)
@@ -124,13 +125,19 @@ for begin_loc in tqdm(range(0, seq_len, stride)):
124125
# to the left by 1.
125126
neg_log_likelihood = outputs.loss
126127

127-
nlls.append(neg_log_likelihood)
128+
# Accumulate the total negative log-likelihood and the total number of tokens
129+
num_valid_tokens = (target_ids != -100).sum().item() # number of valid tokens in target_ids
130+
batch_size = target_ids.size(0)
131+
num_loss_tokens = num_valid_tokens - batch_size # subtract batch_size due to internal label shift
132+
nll_sum += neg_log_likelihood * num_loss_tokens
133+
n_tokens += num_loss_tokens
128134

129135
prev_end_loc = end_loc
130136
if end_loc == seq_len:
131137
break
132138

133-
ppl = torch.exp(torch.stack(nlls).mean())
139+
avg_nll = nll_sum / n_tokens # average negative log-likelihood per token
140+
ppl = torch.exp(avg_nll)
134141
```
135142

136143
Running this with the stride length equal to the max input length is equivalent to the suboptimal, non-sliding-window
@@ -139,5 +146,5 @@ and the better the reported perplexity will typically be.
139146

140147
When we run the above with `stride = 1024`, i.e. no overlap, the resulting PPL is `19.44`, which is about the same
141148
as the `19.93` reported in the GPT-2 paper. By using `stride = 512` and thereby employing our striding window
142-
strategy, this jumps down to `16.45`. This is not only a more favorable score, but is calculated in a way that is
149+
strategy, this jumps down to `16.44`. This is not only a more favorable score, but is calculated in a way that is
143150
closer to the true autoregressive decomposition of a sequence likelihood.

0 commit comments

Comments
 (0)