-
Notifications
You must be signed in to change notification settings - Fork 33
Open
Description
Hi, thank you for sharing the great code base!!
I have one question related to the loss calculation. Could you tell me why the average loss is calculated across all segments during training, but only the loss from the last segment is used as the evaluation loss or perplexity? I understand the average loss during the training, but shouldn't we also calculate the average loss during the test to have a fair comparison with other methods that do not segment the data?
Metadata
Metadata
Assignees
Labels
No labels