Skip to content

Topological alignment between GPUs and NICs in DRA (exposing pci device topology as device attribute?)ย #213

@everpeace

Description

@everpeace

I understand DRA will finally promote to Beta in v1.32๐ŸŽ‰ Thank you very much contributors for your hard work standardizing flexible device scheduling and implementing NVIDIA's dra-driver.

Do you have a plan exposing intra-node topology as device attribute?? Especially distances between GPU<->GPU and GPU<->NIC or HCA (I imagine nvidia-smi topo -m equivalent information)? Or, would you have a plan to provide some extension point to add user-defined device attribute in this dar-driver??

I imagine below usecases for optimizing training performance:

Thanks, in advance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureissue/PR that proposes a new feature or functionality

    Type

    No type

    Projects

    Status

    Closed

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions