try-sparse-categorical-crossentropy-on-288-composite-llm-model-using


# From #288 

# The problem:

- The one - hot - encoded labels with CategoricalCrossEntropy are more memory heavy than an integer label with SparseCategoricalCrossEntropy.

# The solution:

Replace CategoricalCrossEntropy with SparseCategoricalCrossEntropy

# Tasks:

- [x] Refactor prepare_data to return a singleton integer label for each text token.
- [x] Refactor the generate loop to use logits not probs.
- [x] Refactor stage 1-a to use the new label format and use SparseCategoricalCrossEntropy
- [x] Refactor the Dataset object for Stage 1-b training, to feed batches of integer labels, not one hot encoded labels .
- [ ] Verify the distributions in the final outputs are the same (e.g. number before top_... sampling, penalties, and equivalent results after ... ).
- [x] Make a sparse version of the Perplexity metric.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

try-sparse-categorical-crossentropy-on-288-composite-llm-model-using #289

From #288

The problem:

The solution:

Tasks:

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

try-sparse-categorical-crossentropy-on-288-composite-llm-model-using #289

Description

From #288

The problem:

The solution:

Tasks:

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions