Commit 73bc12c
committed
[AMDGPU] Use wider loop lowering type for LowerMemIntrinsics (llvm#112332)
When llvm.memcpy or llvm.memmove intrinsics are lowered as a loop in
LowerMemIntrinsics.cpp, the loop consists of a single load/store pair
per iteration. We can improve performance in some cases by emitting
multiple load/store pairs per iteration. This patch achieves that by
increasing the width of the loop lowering type in the GCN target and
letting legalization split the resulting too-wide access pairs into
multiple legal access pairs.
This change only affects lowered memcpys and memmoves with large (>=
1024 bytes) constant lengths. Smaller constant lengths are handled by
ISel directly; non-constant lengths would be slowed down by this change
if the dynamic length was smaller or slightly larger than what an
unrolled iteration copies.
The chosen default unroll factor is the result of microbenchmarks on
gfx1030. This change leads to speedups of 15-38% for global memory and
1.9-5.8x for scratch in these microbenchmarks.
Part of SWDEV-455845.
(cherry picked from commit a4fd3db,
includes updates to GlobalISel/llvm.memcpy.ll, memintrinsic-unroll.ll,
memmove-var-size.ll)
Change-Id: Ia92a4b2c504ed6f0d2c2f872dc41f97b548fdafb1 parent 6aaa5fb commit 73bc12c
File tree
4 files changed
+16445
-234
lines changed- llvm
- lib/Target/AMDGPU
- test/CodeGen/AMDGPU
- GlobalISel
4 files changed
+16445
-234
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
75 | 75 | | |
76 | 76 | | |
77 | 77 | | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
78 | 85 | | |
79 | 86 | | |
80 | 87 | | |
| |||
409 | 416 | | |
410 | 417 | | |
411 | 418 | | |
412 | | - | |
413 | | - | |
414 | | - | |
415 | 419 | | |
416 | 420 | | |
417 | | - | |
418 | | - | |
419 | 421 | | |
420 | 422 | | |
421 | 423 | | |
| |||
442 | 444 | | |
443 | 445 | | |
444 | 446 | | |
445 | | - | |
446 | | - | |
447 | | - | |
| 447 | + | |
| 448 | + | |
| 449 | + | |
| 450 | + | |
| 451 | + | |
| 452 | + | |
| 453 | + | |
| 454 | + | |
| 455 | + | |
| 456 | + | |
| 457 | + | |
| 458 | + | |
| 459 | + | |
| 460 | + | |
| 461 | + | |
| 462 | + | |
448 | 463 | | |
449 | 464 | | |
450 | 465 | | |
451 | 466 | | |
452 | 467 | | |
453 | 468 | | |
454 | 469 | | |
455 | | - | |
456 | 470 | | |
457 | 471 | | |
458 | 472 | | |
| |||
462 | 476 | | |
463 | 477 | | |
464 | 478 | | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
465 | 485 | | |
466 | 486 | | |
467 | 487 | | |
| |||
0 commit comments