Training TextCat on GPU never trains and crashes VM #9668
-
I am having an issue getting
Has anyone else experienced issues like this? I had some difficulties getting my Python environment configured correctly which I'm thinking may be the cause of the issues - but I am getting to this point which makes me think it could be something else? I have CUDA 11 showing in
So I updated with I would include |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
What is your actual train command? If the GPU isn't being used it could be trying to use a Transformer on CPU, which is possible but extremely slow. That would be consistent with your nvidia-smi output. Like you suspect, the whole training set is loaded into memory by default. If that's an issue you can use a custom loader to stream the corpus instead, see here. Also, just to rule out other issues, you might try training with a non-transformer base model. It sounds like you are having out of memory issues, though I wouldn't normally expect that to crash the VM. |
Beta Was this translation helpful? Give feedback.
What is your actual train command? If the GPU isn't being used it could be trying to use a Transformer on CPU, which is possible but extremely slow. That would be consistent with your nvidia-smi output.
Like you suspect, the whole training set is loaded into memory by default. If that's an issue you can use a custom loader to stream the corpus instead, see here.
Also, just to rule out other issues, you might try training with a non-transformer base model.
It sounds like you are having out of memory issues, though I wouldn't normally expect that to crash the VM.