llamafactory中用lora微调qwen3-32b-awq模型时梯度为nan损失为0的问题及其解决方案 #9125
gysabc
started this conversation in
Show and tell
Replies: 2 comments 1 reply
-
|
大佬,你是怎么配置AWQ量化环境的,我这里5090卡,cuda13, pytorch2.9离线安装,我试了安装响应的AWQ存在环境冲突。 方便把你的安装的AWQ的相关环境配置给我看看吗? |
Beta Was this translation helpful? Give feedback.
0 replies
-
|
大佬,你是怎么配置AWQ量化环境的,我这里5090卡,cuda13, pytorch2.9离线安装,我试了安装响应的AWQ存在环境冲突。 方便把你的安装的AWQ的相关环境配置给我看看吗? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
数据:内部的数据
问题发生:训练日志中,某一条突然出现grad_norm为nan,之后的每条日志均是loss为0,grad_norm为nan
训练参数设置:
问题分析:
问题解决:
x=x.half()x = torch.clamp(x, min=torch.finfo(torch.float16).min, max=torch.finfo(torch.float16).max).half()Beta Was this translation helpful? Give feedback.
All reactions