Why do we get different logits when the number of tokens in the batch is different? #9837
Unanswered
spacecat2002
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
When the batch have only one token:
When the batch have more than one token:
This question was found when I want to implement the speculative decoding by myself. The target model should verify all tokens in batch in parallel.But It got different logit about the first token in the batch compared with verifying token by token. I don`t no where the problem is. Can anyone help me? Thanks a lot!
Beta Was this translation helpful? Give feedback.
All reactions