Hello! Thank you for sharing the clean and well-documented implementation of nucleus sampling. While reading through the explanation, I noticed two small errors in the written description that could be slightly confusing for readers.
In the section that begins:
"That is, we pick the highest probable tokens until the sum of their probabilities is less that $p$."
There are two issues:
- Logical oversight: The condition should be "is not less than $p$" (or equivalently, "is at least $p$"), since we want the smallest set of tokens whose cumulative probability reaches or exceeds $p$.
- Typo: "less that" should be "less than".