-
Notifications
You must be signed in to change notification settings - Fork 13
Integrate TQ2_0 into Vulkan #33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: temp-latest
Are you sure you want to change the base?
Conversation
Signed-off-by: Marcus Edel <[email protected]>
Signed-off-by: Marcus Edel <[email protected]>
Signed-off-by: Marcus Edel <[email protected]>
Signed-off-by: Marcus Edel <[email protected]>
Signed-off-by: Marcus Edel <[email protected]>
Signed-off-by: Marcus Edel <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It adds support for TQ2_0, which uses the range (-1, 0, 1, 2), rather than TQ1_0 (-1, 0, 1).
So if we quantize BitNet using TQ2_0, the value 2 would never actually be used — meaning we’d be using about 25% more memory when storing in this format? why not use TQ1_0, is seems to be better alighted with bitnet quantization
@olyasir that's a good point you bring up regarding TQ2_0 vs TQ1_0. You're right about it being more memory usage, and if you want us to implement it in TQ1_0, we have enough time in the SLM project to do it. I just wanted to clarify the difference between the two types: Model size:
Inference speed:
Note: These numbers are for CPU because the original PR doesn't implement support for TQ2_0 in any GPU backend. @olyasir To reiterate, we can implement TQ1_0 support and it shouldn't take too long, but I just wanted to show the trade-offs first so you can make a decision on whether you think it's worth it. What do you think? Should we implement it? |
Signed-off-by: Marcus Edel <[email protected]>
Signed-off-by: Marcus Edel <[email protected]>
Signed-off-by: Marcus Edel <[email protected]>
Rebased #22 on temp-latest.