Skip to content
This repository was archived by the owner on Sep 10, 2025. It is now read-only.

[Distributed] Implement universal batch_decode & decode_in_flight for llama2 & llama3, with deterministic or multinomial (topk) decoding (handle both sentencepiece (llama2) and tiktoken (llama3))#1234

Merged
lessw2020 merged 11 commits intomainfrom
lessw2020/demo_metrics
Oct 2, 2024

Commits

Commits on Sep 28, 2024

Commits on Sep 30, 2024