-
Notifications
You must be signed in to change notification settings - Fork 48
Open
Description
I see some discrepancy in the model training script and the paper training details. It would really helpful if someone from the team can clarify these:
- The script uses
"qkv_proj,o_proj,gate_up_proj,down_proj,k_proj,q_proj,out_proj,v_proj"as the target lora modules, butQwen2does not haveqkv_proj,gate_up_projbut insteadqkvprojgate_projandup_projmodules. Is this a typo? What exact modules were trained with LoRA as I wish to reproduce the result and I am running into some issues with it. - The script mentions that the lora scaling
$\alpha$ is 64 (default), but the paper mentions it as 32.
IemProg and wang11wang
Metadata
Metadata
Assignees
Labels
No labels