-
Notifications
You must be signed in to change notification settings - Fork 13.4k
Llama4 RoPE fix #12889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Llama4 RoPE fix #12889
Conversation
@ngxson :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thanks!
@ggerganov FYI, I write a small script here: https://gist.github.com/ngxson/6dec015080121d239caa668332fba3f8 It calculates the Below is the value: on the left, old
For now, I have no idea now to set this after loading GGUF (as discussed via DM), but feel free to make suggestion! |
Thanks. AFAIU the upstream models have been updated with a new RoPE config which technically would require re-converting existing GGUF models. I don't think there is an elegant way to avoid this conversion and do it seamlessly so that old GGUFs work with the new rope factors. It seems it will always be some non-trivial hack that will remain in the codebase forever. So I think it is better to just recommend conversion/re-download of the models. We can put a notice in the README in hot topics? |
Yes sounds ok to me, if it's too hacky then let's not do it. I think most people will use quantization from @unslothai or @bartowski1182 anyway (which was and will be updated very quickly), so probably don't need to add a notice. |
Yeah I'll let it sit another week to make sure there's nothing else breaking and throw the reconverted model up |
Llama4 Scout config.json changed RoPE scaling, so we need to remove the assert since it breaks on Llama 4