After changing the softmax loss to sigmoid loss, there was no significant reduction in gpus's memory (perhaps only slightly).
Could you please provide some approximate numerical conclusions so that I can confirm if I was wrong, as your paper did not specify the percentage reduction in memory.