Is there any way to push the model itself to the CPU? I keep running OOM #2274
Closed
throwitthefuckaway
started this conversation in
General
Replies: 1 comment 3 replies
-
Hey! Unfortunately, the codebase is mainly centered around training on GPUs, only tokenization/merging adapters are possible on CPU. Offloading (via deepspeed/fsdp) may be one method, but I've not seen it used with single GPU only. Would it be possible to temporarily rent a cloud GPU to do the training? |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I have a pretty bad setup for LLM training, 16GB RAM, 2VRAM Nvidia on a laptop, my only option is to run the training process in the CPU, though I can't seem to do it! The LoRA goes to the CPU just fine I think, but the model goes to the GPU and just after a few seconds, CUDA OOM's
So does anyone know what can I do? I already tried setting gpu_memory_limit to a low amount, setting device_map to "cpu" and CUDA_VISIBLE_DEVICES="", but none of theses worked
Beta Was this translation helpful? Give feedback.
All reactions