-
Notifications
You must be signed in to change notification settings - Fork 19
Open
Description
Now that Llama 3.2 was added in this PR, we are missing the following improvements:
- Async checks (check stop criterion for stop token in background thread)
- Tensor cache should be fixed size (slice assign, not concat)
We are missing some ops for Top-P sampling, but we can have the first release with greedy sampling (argmax).
For Top-P sampling, we need:
- sorting (
tensor.sort_descending_with_indicesruns the default impl on CPU) - cumsum (Burn PR missing cubecl kernel)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels