多卡微调时提示显存不足/ CUDA Out of Memory Error #1473
Unanswered
Chenhong-Zhang
asked this question in
Q&A
Replies: 2 comments
-
跟你同样的问题,也是lora训练的时候报cuda oom,但是提示信息里面显示需要171 T 的内存,不知道是哪里配置问题,用swift可以进行微调,但是还没找到合并成新模型,并且能够进行推理的方案 |
Beta Was this translation helpful? Give feedback.
0 replies
-
改用Ptuning 微调,同时在Main函数里增加模型量化,可以进行微调 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
我使用了8*RTX 3080 进行训练,每张卡有10GB显存。
调用以下微调命令:
OMP_NUM_THREADS=1 torchrun --standalone --nnodes=1 --nproc_per_node=8 finetune_hf.py /home/usr/IRE/data /home/usr/IRE/chatglm3-6b configs/lora.yaml configs/ds_zero_2.json
报错提示显存不足:
观察Traceback发现是在模型to device过程出现了错误:
Beta Was this translation helpful? Give feedback.
All reactions