Commit 9e9b500
[XLA:GPU] Fix the derivation for the number of warps for tiled HLO computations.
The number of warps used to process a computation determines how many
registers we are able to use concurrently. Therefore, looking at the largest
(padded) tile size makes sense, since it determines the minimum number of
elements that must be live concurrently.
Previously, the logic erroneously only looked at the output tile sizes.
This approach is not perfect, and may be further improved by e.g. doing a
live range analysis on the tiles of the computation.
PiperOrigin-RevId: 6806688561 parent 12d351d commit 9e9b500
File tree
2 files changed
+57
-2
lines changed- xla/service/gpu/model
2 files changed
+57
-2
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
524 | 524 | | |
525 | 525 | | |
526 | 526 | | |
527 | | - | |
528 | 527 | | |
529 | | - | |
| 528 | + | |
| 529 | + | |
| 530 | + | |
| 531 | + | |
| 532 | + | |
| 533 | + | |
| 534 | + | |
| 535 | + | |
| 536 | + | |
530 | 537 | | |
531 | 538 | | |
532 | 539 | | |
| |||
Lines changed: 48 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
620 | 620 | | |
621 | 621 | | |
622 | 622 | | |
| 623 | + | |
| 624 | + | |
| 625 | + | |
| 626 | + | |
| 627 | + | |
| 628 | + | |
| 629 | + | |
| 630 | + | |
| 631 | + | |
| 632 | + | |
| 633 | + | |
| 634 | + | |
| 635 | + | |
| 636 | + | |
| 637 | + | |
| 638 | + | |
| 639 | + | |
| 640 | + | |
| 641 | + | |
| 642 | + | |
| 643 | + | |
| 644 | + | |
| 645 | + | |
| 646 | + | |
| 647 | + | |
| 648 | + | |
| 649 | + | |
| 650 | + | |
| 651 | + | |
| 652 | + | |
| 653 | + | |
| 654 | + | |
| 655 | + | |
| 656 | + | |
| 657 | + | |
| 658 | + | |
| 659 | + | |
| 660 | + | |
| 661 | + | |
| 662 | + | |
| 663 | + | |
| 664 | + | |
| 665 | + | |
| 666 | + | |
| 667 | + | |
| 668 | + | |
| 669 | + | |
| 670 | + | |
623 | 671 | | |
624 | 672 | | |
625 | 673 | | |
| |||
0 commit comments