Skip to content

Question about the profiling result #994

@kerthcet

Description

@kerthcet

Thanks for the great project !!!

I'm profiling the EPLB algo, I used several tools, one is scalene (also tried the AI optimization tool, pretty cool), another is py-spy, however the result is a bit confusion.

The code is basically like:

    indices = weight.float().sort(-1, descending=True).indices.cpu()
    pack_index = torch.full_like(weight, fill_value=-1, dtype=torch.int64, device='cpu')
    rank_in_pack = torch.full_like(pack_index, fill_value=-1)
    for i in range(num_layers):
        pack_weights = [0] * num_packs
        pack_items = [0] * num_packs
        for group in indices[i]:
            pack = min((i for i in range(num_packs) if pack_items[i] < groups_per_pack),
                       key=pack_weights.__getitem__)
            assert pack_items[pack] < groups_per_pack
            pack_index[i, group] = pack
            rank_in_pack[i, group] = pack_items[pack]
            pack_weights[pack] += weight[i, group]
            pack_items[pack] += 1

Most of them are tensor OP, then I wrote a simulator to run the EPLB algo, and got the profiling result:

For scalene, I ran command: scalene run <python-file>

the result looks like below, the bottleneck is the min() function and pack_items[pack] += 1 from the diagram:

Image

For py-psy, the result is a bit different:

Image

Based on my understanding, the min() is one bottleneck I agree, because it's a greedy algo which will loop the whole datasets.

However, for another bottleneck, if we take a look of the code, it seems the result from py-psy is right, the pack_items[pack] += 1 is more lightweight than the previous three lines of tensor OPs.

Would like to hear some explanations here if possible, thanks.

scalene-profile.json

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions