Conversation
|
@BlinkDL Do I have to quantize |
|
@3outeille yes do it for all matrices weights (ignore time_xxx) |
…ow for some layer + need tests)
|
@BlinkDL Do you happen to have a reference perplexity measure (or whatever metrics ) I can use as a baseline ? |
https://github.com/BlinkDL/ChatRWKV/blob/main/v2/benchmark.py use the LAMBADA ppl here |
|
Question: would we expect a huge improvement wrt perplexity if we did quantization-aware training? |
|
@meditans QAT will probably yield huge improvement but this imply re-training your model whereas GPTQ uses a post-training quantization strategy (no re-training involved) |
f4584b4 to
76d937b
Compare
… step Date: Tue May 2 18:17:57 2023 +0000
|
How's it going :) are you in Discord |
|
Yep, I sent a message on discord in quantization channel |
|
Hi. Is it available now? |
|
@Evilran Hi, making it work with chatRWKV is too much of a hassle because it requires to change the RWKV class too much, thus the PR will not be accepted. However, I made it work with HuggingFace version of RWKV if you want: https://github.com/3outeille/GPTQ-for-RWKV |
This is work in progress and serve as main thread for any questions related to this topic