I'm tying to use the data_parallel option inside torchrec (I'm forcing the sharding type to be data_parallel), but I'm getting the warning described in the title.
I was trying to look-up the source code about this warning but the sorrounding context didn't help me what kind of caching is disabled here. What I'm experiencing is that the perf got cut in half.
My idea was to use data_parallel for smaller embedding bags, so that these are replicated on all GPU-s (I'm using an GPU datacebter hardware), thus requests toward these embedding bags could be handled by any of the GPUs. I would expect some boost in performance in this case but instead the perf was reduced to 50%.
Can someone explain me why this happens?