Hi,
I noticed very high CPU memory usage when running GSM8K with the official scripts.
- Single process (1 GPU): ~160GB RAM
- 4 GPUs: ~600GB RAM
Is this expected behavior, or might it indicate a memory leak in the dataloader/diffusion loop?
Any advice would be greatly appreciated!