Replies: 1 comment
-
没试过V100.。而且正常只要10多个G啊 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
我用的是V100显卡,在阿里云PAI平台上训练时遇到了内存碎片化的问题,尝试在命令行输入PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True也没有用,希望大佬能指点一下,谢谢!
我输入了python finetune_hf.py data/fix/ /mnt/workspace/chatglm3-6b configs/lora.yaml。
报错日志
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 32.00 MiB. GPU 0 has a total capacity of 15.78 GiB of which 1.75 MiB is free. Process 5496 has 15.78 GiB memory in use. Of the allocated memory 14.78 GiB is allocated by PyTorch, and 95.40 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
0%| | 0/1000 [00:02<?, ?it/s]
Beta Was this translation helpful? Give feedback.
All reactions