Replies: 1 comment
-
Apparently it's actually normal for the CPU to be pegged waiting for the GPU. It's a busy loop polling for completion. See: https://forums.developer.nvidia.com/t/cpu-usage-while-waiting-for-kernel/11272/2. Still no idea what's going on on Colab TPU, but I suppose that's a separate question now. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I've written a little byte-level language model using Jax & Flax and for some reason when training it it pegs a CPU core, even though nearly all the work should be happening on my GPU. And
nvidia-smi
reports GPU utilization is consistently at 100%. So I'm confused. Meanwhile, with a TPU core on Colab the TPU idle time is around 30% and matrix unit utilization is around 5.5%. Not sure if that's the same issue - my desktop has a very powerful CPU and a pretty decent GPU, while on Colab the CPU is pretty underpowered and the TPU is very powerful, so it'd make sense for the TPU to be starved if there were some inefficiency feeding it.Here's my training loop:
Enwik9Loader
is an iterable of NumPy arrays, all of which are views into one master array. I've put the full code up here if you'd like to take a look. Revision 6802b02 corresponds to the snippet above.line_profiler
shows that 89.6% of the time is spent on the line that callsfast_train_step
in the loop, and 7.4% is spent when the function is first called to JIT it. The remainder is setting up the iterable and various tiny things.So what's going on? Is my code not dispatching fast enough to saturate the GPU, and if so why does
nvidia-smi
say the GPU is saturated? Is there some spinlock or something that makes the host CPU utilization a meaningless metric?Beta Was this translation helpful? Give feedback.
All reactions