-
Notifications
You must be signed in to change notification settings - Fork 6
Open
Description
From #288
The problem:
- The one - hot - encoded labels with CategoricalCrossEntropy are more memory heavy than an integer label with SparseCategoricalCrossEntropy.
The solution:
Replace CategoricalCrossEntropy with SparseCategoricalCrossEntropy
Tasks:
- Refactor prepare_data to return a singleton integer label for each text token.
- Refactor the generate loop to use logits not probs.
- Refactor stage 1-a to use the new label format and use SparseCategoricalCrossEntropy
- Refactor the Dataset object for Stage 1-b training, to feed batches of integer labels, not one hot encoded labels .
- Verify the distributions in the final outputs are the same (e.g. number before top_... sampling, penalties, and equivalent results after ... ).
- Make a sparse version of the Perplexity metric.
Metadata
Metadata
Assignees
Labels
No labels