运行lora微调代码时报错。torch.cuda.OutOfMemoryError: CUDA out of memory. #840

ghost · 2024-02-16T09:07:53Z

ghost
Feb 16, 2024

我用的是V100显卡，在阿里云PAI平台上训练时遇到了内存碎片化的问题，尝试在命令行输入PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True也没有用，希望大佬能指点一下，谢谢！
我输入了python finetune_hf.py data/fix/ /mnt/workspace/chatglm3-6b configs/lora.yaml。
报错日志
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 32.00 MiB. GPU 0 has a total capacity of 15.78 GiB of which 1.75 MiB is free. Process 5496 has 15.78 GiB memory in use. Of the allocated memory 14.78 GiB is allocated by PyTorch, and 95.40 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

0%| | 0/1000 [00:02<?, ?it/s]

zRzRzRzRzRzRzR · 2024-02-24T02:56:01Z

zRzRzRzRzRzRzR
Feb 24, 2024
Maintainer

没试过V100.。而且正常只要10多个G啊

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

运行lora微调代码时报错。torch.cuda.OutOfMemoryError: CUDA out of memory. #840

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

运行lora微调代码时报错 。torch.cuda.OutOfMemoryError: CUDA out of memory. #840

Uh oh!

Uh oh!

ghost Feb 16, 2024

Replies: 1 comment

Uh oh!

zRzRzRzRzRzRzR Feb 24, 2024 Maintainer

运行lora微调代码时报错。torch.cuda.OutOfMemoryError: CUDA out of memory. #840

ghost
Feb 16, 2024

zRzRzRzRzRzRzR
Feb 24, 2024
Maintainer