Deepspeed: gpt-neo-2.7B not trackable with RTX 3090 and 64GB RAM? #13587
Unanswered
neil-tan
asked this question in
DDP / multi-GPU / multi-node
Replies: 1 comment
-
To handle this big models i recommend using deepspeed for fix high ram usage. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello there,
I'm having trouble fine-tuning HF pre-trained transformer
EleutherAI/gpt-neo-2.7B
on my RTX 3090 with 64GB RAM. Withstage=3
off-loading, the RAM usage seems excessively high. Is it normal to run out of 64GB of system memory with Adam optimizer for a 2.7B parameters model?Error Short
; cpu off-loading [2022-07-10 10:53:21,611] [INFO] [utils.py:829:see_memory_usage] MA 10.75 GB Max_MA 10.75 GB CA 10.76 GB Max_CA 17 GB [2022-07-10 10:53:21,612] [INFO] [utils.py:837:see_memory_usage] CPU Virtual Memory: used = 59.96 GB, percent = 95.6% [2022-07-10 10:53:22,068] [INFO] [utils.py:828:see_memory_usage] before backward [2022-07-10 10:53:22,069] [INFO] [utils.py:829:see_memory_usage] MA 11.93 GB Max_MA 12.39 GB CA 12.43 GB Max_CA 12 GB [2022-07-10 10:53:22,069] [INFO] [utils.py:837:see_memory_usage] CPU Virtual Memory: used = 59.98 GB, percent = 95.7% [2022-07-10 10:53:22,177] [INFO] [utils.py:828:see_memory_usage] before optimizer [2022-07-10 10:53:22,178] [INFO] [utils.py:829:see_memory_usage] MA 11.91 GB Max_MA 11.93 GB CA 12.43 GB Max_CA 12 GB [2022-07-10 10:53:22,178] [INFO] [utils.py:837:see_memory_usage] CPU Virtual Memory: used = 59.98 GB, percent = 95.7% Killed
Complete Logs
cpu off-loading
nvme off-loading
System & Environment
RTX 3090, CUDA 11.6
RAM 64GB installed
Torch 1.12.0
Pytorch-lightning 1.6.4
Deep-speed 0.6.5
Python 3.8.6
pip freeze
Code
Beta Was this translation helpful? Give feedback.
All reactions