about GPU number for training

I would like to ask, is the MOTR model more effective for multi-GPU parallel training or a powerful single GPU?
I have four 3090s and one A100, but the training time on the A100 is too slow. I don't know why. Is it because the batch size is 1? Can I speed up the training by training with multiple cards in parallel?