it's not as expected to save memory #7624
Unanswered
jinggaizi
asked this question in
DDP / multi-GPU / multi-node
Replies: 2 comments
-
could you give more information as to what you're running? there is definitely an improvement in memory at larger scales (but we've seen improvements with 250M+ params, as seen here https://share.streamlit.io/seannaren/mingpt/streamlit/app.py |
Beta Was this translation helpful? Give feedback.
0 replies
-
thanks for your quick reply, i train a ASR model just 30M, i will try to use a larger model |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
i just run the code in ReadMe, and test
trainer = pl.Trainer(gpus=8, accelerator="ddp")
trainer = pl.Trainer(gpus=8, accelerator="ddp", plugins='ddp_sharded')
but the memory used is same.
i work on torch1.7.1 and titan xp, is there any requirement for training ,such as v100, or is only effective for huge parameter model
cc: @SeanNaren
Beta Was this translation helpful? Give feedback.
All reactions