Commit 3eb956e
committed
Inclusive scan iter chunk update kernel (generic and 1d) improved
The chunk update kernels processed consecutive elements in contiguous
memory, hence sub-group memory access pattern was sub-optimal (no
coalescing).
This PR changes these kernels to process n_wi elements which are
sub-group size apart, improving memory access patern.
Running a micro-benchmark based on code from gh-1249 (for
shape =(n, n,) where n = 4096) with this change:
```
(dev_dpctl) opavlyk@mtl-world:~/repos/dpctl$ ONEAPI_DEVICE_SELECTOR=cuda:gpu python index.py
0.010703916665753004
(dev_dpctl) opavlyk@mtl-world:~/repos/dpctl$ ONEAPI_DEVICE_SELECTOR=cuda:gpu python index.py
0.01079747307597211
(dev_dpctl) opavlyk@mtl-world:~/repos/dpctl$ ONEAPI_DEVICE_SELECTOR=cuda:gpu python index.py
0.010864820314088353
(dev_dpctl) opavlyk@mtl-world:~/repos/dpctl$ python index.py
0.023878061203975922
(dev_dpctl) opavlyk@mtl-world:~/repos/dpctl$ python index.py
0.023666468500677083
```
while before:
```
(dev_dpctl) opavlyk@mtl-world:~/repos/dpctl$ ONEAPI_DEVICE_SELECTOR=cuda:gpu python index.py
0.011415911812542213
(dev_dpctl) opavlyk@mtl-world:~/repos/dpctl$ ONEAPI_DEVICE_SELECTOR=cuda:gpu python index.py
0.011722088705196424
(dev_dpctl) opavlyk@mtl-world:~/repos/dpctl$ ONEAPI_DEVICE_SELECTOR=level_zero:gpu python index.py
0.030126182353813893
(dev_dpctl) opavlyk@mtl-world:~/repos/dpctl$ ONEAPI_DEVICE_SELECTOR=level_zero:gpu python index.py
0.030459783371986338
```
Running the same code using NumPy (same size):
```
(dev_dpctl) opavlyk@mtl-world:~/repos/dpctl$ python index_np.py
0.01416253090698134
(dev_dpctl) opavlyk@mtl-world:~/repos/dpctl$ python index_np.py
0.014979530811413296
```
The reason Level-Zero device is slower has to do with slow allocation/deallocation bug.
OpenCL device has better timing. With this change:
```
(dev_dpctl) opavlyk@mtl-world:~/repos/dpctl$ ONEAPI_DEVICE_SELECTOR=opencl:gpu python index.py
0.015038836885381627
(dev_dpctl) opavlyk@mtl-world:~/repos/dpctl$ ONEAPI_DEVICE_SELECTOR=opencl:gpu python index.py
0.01527448468496678
```
before:
```
(dev_dpctl) opavlyk@mtl-world:~/repos/dpctl$ ONEAPI_DEVICE_SELECTOR=opencl:gpu python index.py
0.01758851639115838
(dev_dpctl) opavlyk@mtl-world:~/repos/dpctl$ ONEAPI_DEVICE_SELECTOR=opencl:gpu python index.py
0.017089676241286926
```1 parent ec924c3 commit 3eb956e
1 file changed
+66
-22
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
404 | 404 | | |
405 | 405 | | |
406 | 406 | | |
| 407 | + | |
| 408 | + | |
| 409 | + | |
| 410 | + | |
| 411 | + | |
| 412 | + | |
| 413 | + | |
| 414 | + | |
| 415 | + | |
| 416 | + | |
| 417 | + | |
| 418 | + | |
| 419 | + | |
| 420 | + | |
| 421 | + | |
| 422 | + | |
| 423 | + | |
407 | 424 | | |
408 | 425 | | |
409 | 426 | | |
| 427 | + | |
410 | 428 | | |
411 | 429 | | |
412 | | - | |
| 430 | + | |
| 431 | + | |
| 432 | + | |
413 | 433 | | |
414 | | - | |
415 | | - | |
416 | | - | |
417 | | - | |
| 434 | + | |
| 435 | + | |
| 436 | + | |
418 | 437 | | |
419 | 438 | | |
420 | | - | |
421 | | - | |
422 | | - | |
| 439 | + | |
| 440 | + | |
| 441 | + | |
| 442 | + | |
| 443 | + | |
423 | 444 | | |
424 | 445 | | |
425 | | - | |
| 446 | + | |
| 447 | + | |
426 | 448 | | |
427 | 449 | | |
428 | 450 | | |
| |||
661 | 683 | | |
662 | 684 | | |
663 | 685 | | |
| 686 | + | |
| 687 | + | |
| 688 | + | |
| 689 | + | |
| 690 | + | |
| 691 | + | |
| 692 | + | |
| 693 | + | |
| 694 | + | |
| 695 | + | |
| 696 | + | |
| 697 | + | |
| 698 | + | |
| 699 | + | |
| 700 | + | |
| 701 | + | |
664 | 702 | | |
665 | 703 | | |
666 | | - | |
| 704 | + | |
| 705 | + | |
667 | 706 | | |
668 | 707 | | |
669 | 708 | | |
| 709 | + | |
670 | 710 | | |
671 | | - | |
672 | | - | |
673 | | - | |
674 | | - | |
| 711 | + | |
| 712 | + | |
| 713 | + | |
| 714 | + | |
675 | 715 | | |
676 | 716 | | |
677 | | - | |
| 717 | + | |
678 | 718 | | |
679 | | - | |
680 | | - | |
| 719 | + | |
| 720 | + | |
681 | 721 | | |
682 | | - | |
683 | | - | |
684 | | - | |
| 722 | + | |
| 723 | + | |
| 724 | + | |
685 | 725 | | |
686 | | - | |
| 726 | + | |
| 727 | + | |
| 728 | + | |
| 729 | + | |
687 | 730 | | |
688 | 731 | | |
689 | 732 | | |
690 | | - | |
| 733 | + | |
| 734 | + | |
691 | 735 | | |
692 | 736 | | |
693 | 737 | | |
| |||
0 commit comments