Is there any way to push the model itself to the CPU? I keep running OOM #2274

throwitthefuckaway · 2025-01-21T19:00:11Z

throwitthefuckaway
Jan 21, 2025

I have a pretty bad setup for LLM training, 16GB RAM, 2VRAM Nvidia on a laptop, my only option is to run the training process in the CPU, though I can't seem to do it! The LoRA goes to the CPU just fine I think, but the model goes to the GPU and just after a few seconds, CUDA OOM's
So does anyone know what can I do? I already tried setting gpu_memory_limit to a low amount, setting device_map to "cpu" and CUDA_VISIBLE_DEVICES="", but none of theses worked

NanoCode012 · 2025-01-22T07:59:49Z

NanoCode012
Jan 22, 2025
Maintainer

Hey!

Unfortunately, the codebase is mainly centered around training on GPUs, only tokenization/merging adapters are possible on CPU. Offloading (via deepspeed/fsdp) may be one method, but I've not seen it used with single GPU only.

Would it be possible to temporarily rent a cloud GPU to do the training?

3 replies

throwitthefuckaway Jan 22, 2025
Author

That's kinda sad to hear, and I probably can't, I'm afraid, I'll most likely just use a smaller model which can fit in 2GB of VRAM, like pythia 160M, thought, while testing yesterday, I found that, if I decreased the context length to 128, I could use Qwen 0.5B, but I have no idea what anyone can do with such a small context length

NanoCode012 Jan 23, 2025
Maintainer

Yeah, 128 context would not be as useful. It's unfortunate how compute intensive these methods are getting. I would recommend trying a smaller model like smolLM or Pythia as you've said and specialize it to your use case.

I'll close this discussion for now, but if in a future date, there's some way, I'll be sure to update this thread.

NanoCode012 Jan 23, 2025
Maintainer

I also asked around about this. It seems like llama cpp has a trainer that could help with more of cpu focused training.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Is there any way to push the model itself to the CPU? I keep running OOM #2274

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Is there any way to push the model itself to the CPU? I keep running OOM #2274

Uh oh!

throwitthefuckaway Jan 21, 2025

Replies: 1 comment · 3 replies

Uh oh!

Uh oh!

NanoCode012 Jan 22, 2025 Maintainer

Uh oh!

throwitthefuckaway Jan 22, 2025 Author

Uh oh!

NanoCode012 Jan 23, 2025 Maintainer

Uh oh!

NanoCode012 Jan 23, 2025 Maintainer

throwitthefuckaway
Jan 21, 2025

Replies: 1 comment 3 replies

NanoCode012
Jan 22, 2025
Maintainer

throwitthefuckaway Jan 22, 2025
Author

NanoCode012 Jan 23, 2025
Maintainer

NanoCode012 Jan 23, 2025
Maintainer