Question about the profiling result

Thanks for the great project !!!

I'm profiling the [EPLB](https://github.com/deepseek-ai/EPLB) algo, I used several tools, one is scalene (also tried the AI optimization tool, pretty cool), another is py-spy, however the result is a bit confusion.

The code is basically like:

```python
    indices = weight.float().sort(-1, descending=True).indices.cpu()
    pack_index = torch.full_like(weight, fill_value=-1, dtype=torch.int64, device='cpu')
    rank_in_pack = torch.full_like(pack_index, fill_value=-1)
    for i in range(num_layers):
        pack_weights = [0] * num_packs
        pack_items = [0] * num_packs
        for group in indices[i]:
            pack = min((i for i in range(num_packs) if pack_items[i] < groups_per_pack),
                       key=pack_weights.__getitem__)
            assert pack_items[pack] < groups_per_pack
            pack_index[i, group] = pack
            rank_in_pack[i, group] = pack_items[pack]
            pack_weights[pack] += weight[i, group]
            pack_items[pack] += 1
```

Most of them are tensor OP, then I wrote a simulator to run the EPLB algo, and got the profiling result:

For scalene, I ran command: `scalene run <python-file>`

the result looks like below, the bottleneck is the min() function and `pack_items[pack] += 1` from the diagram:

<img width="1509" height="523" alt="Image" src="https://github.com/user-attachments/assets/feedc156-8a73-45f9-a72a-53d59dad536c" />

For py-psy, the result is a bit different:

<img width="3022" height="984" alt="Image" src="https://github.com/user-attachments/assets/215c6c4b-1bf4-43a4-8d71-b1101ece6e60" />

Based on my understanding, the min() is one bottleneck I agree, because it's a greedy algo which will loop the whole datasets. 

However, for another bottleneck, if we take a look of the code, it seems the result from py-psy is right, the `pack_items[pack] += 1` is more lightweight than the previous three lines of tensor OPs.

Would like to hear some explanations here if possible, thanks.

[scalene-profile.json](https://github.com/user-attachments/files/24975776/scalene-profile.json)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Question about the profiling result #994

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Question about the profiling result #994

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions