Skip to content

Clarify WARNING:torchrec.distributed.utils:Sharding Type is data_parallel, caching params will be ignored #3505

@gyulaz-htec

Description

@gyulaz-htec

I'm tying to use the data_parallel option inside torchrec (I'm forcing the sharding type to be data_parallel), but I'm getting the warning described in the title.
I was trying to look-up the source code about this warning but the sorrounding context didn't help me what kind of caching is disabled here. What I'm experiencing is that the perf got cut in half.
My idea was to use data_parallel for smaller embedding bags, so that these are replicated on all GPU-s (I'm using an GPU datacebter hardware), thus requests toward these embedding bags could be handled by any of the GPUs. I would expect some boost in performance in this case but instead the perf was reduced to 50%.
Can someone explain me why this happens?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions