Skip to content

dispatch_wait_recv_cost_stats is cumulating time from each warp, when instead computing a max across warps would be more helpful #473

@goelayu

Description

@goelayu
atomicAdd(reinterpret_cast<unsigned long long*>(dispatch_wait_recv_cost_stats + src_rank), wait_recv_cost);

This op adds the wait time as seen by a given warp for a given src_rank. This time is being added by each warp.

I wonder what is the utility of this metric? Each warp is waiting in parallel. Are we trying to infer slow ranks through this metric?

More interesting would be the max across warps since that would indicate the actual wait time from a src_rank.

Then we could also get a max of this max across src_ranks to actually infer the pure network comms latency of the recv kernel.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions