@@ -1451,6 +1451,7 @@ The AMDGPU backend implements the following LLVM IR intrinsics.
14511451 It is preferred over llvm.amdgcn.mov.dpp.`<type>` for future use.
14521452 `llvm.amdgcn.update.dpp.<type> <old> <src> <dpp_ctrl> <row_mask> <bank_mask> <bound_ctrl>`
14531453 Should be equivalent to:
1454+
14541455 - `v_mov_b32 <dest> <old>`
14551456 - `v_mov_b32 <dest> <src> <dpp_ctrl> <row_mask> <bank_mask> <bound_ctrl>`
14561457
@@ -6032,7 +6033,7 @@ GFX6-GFX8
60326033 available in dispatch packet. For M0, it is also possible to use maximum
60336034 possible value of LDS for given target (0x7FFF for GFX6 and 0xFFFF for
60346035 GFX7-GFX8).
6035- GFX9-GFX11
6036+ GFX9 and later
60366037 The M0 register is not used for range checking LDS accesses and so does not
60376038 need to be initialized in the prolog.
60386039
@@ -16639,25 +16640,25 @@ scratch address space.
1663916640
1664016641On entry to a function:
1664116642
16642- 1 . SGPR0-3 contain a V# with the following properties (see
16643+ # . SGPR0-3 contain a V# with the following properties (see
1664316644 :ref:`amdgpu-amdhsa-kernel-prolog-private-segment-buffer`):
1664416645
1664516646 * Base address pointing to the beginning of the wavefront scratch backing
1664616647 memory.
1664716648 * Swizzled with dword element size and stride of wavefront size elements.
1664816649
16649- 2 . The FLAT_SCRATCH register pair is setup. See
16650+ # . The FLAT_SCRATCH register pair is setup. See
1665016651 :ref:`amdgpu-amdhsa-kernel-prolog-flat-scratch`.
16651- 3 . GFX6-GFX8: M0 register set to the size of LDS in bytes. See
16652+ # . GFX6-GFX8: M0 register set to the size of LDS in bytes. See
1665216653 :ref:`amdgpu-amdhsa-kernel-prolog-m0`.
16653- 4 . The EXEC register is set to the lanes active on entry to the function.
16654- 5 . MODE register: *TBD*
16655- 6 . VGPR0-31 and SGPR4-29 are used to pass function input arguments as described
16654+ # . The EXEC register is set to the lanes active on entry to the function.
16655+ # . MODE register: *TBD*
16656+ # . VGPR0-31 and SGPR4-29 are used to pass function input arguments as described
1665616657 below.
16657- 7 . SGPR30-31 return address (RA). The code address that the function must
16658+ # . SGPR30-31 return address (RA). The code address that the function must
1665816659 return to when it completes. The value is undefined if the function is *no
1665916660 return*.
16660- 8 . SGPR32 is used for the stack pointer (SP). It is an unswizzled scratch
16661+ # . SGPR32 is used for the stack pointer (SP). It is an unswizzled scratch
1666116662 offset relative to the beginning of the wavefront scratch backing memory.
1666216663
1666316664 The unswizzled SP can be used with buffer instructions as an unswizzled SGPR
@@ -16694,19 +16695,19 @@ On entry to a function:
1669416695 arguments after the last local allocation and adjust SGPR32 to the address
1669516696 after the last local allocation.
1669616697
16697- 9. All other registers are unspecified.
16698- 10 . Any necessary ``s_waitcnt`` has been performed to ensure memory is available
16699- to the function.
16700- 11 . Use pass-by-reference (byref) in stead of pass-by-value (byval) for struct
16701- arguments in C ABI. Callee is responsible for allocating stack memory and
16702- copying the value of the struct if modified. Note that the backend still
16703- supports byval for struct arguments.
16698+ #. All other registers are unspecified.
16699+ # . Any necessary ``s_waitcnt`` has been performed to ensure memory is available
16700+ to the function.
16701+ # . Use pass-by-reference (byref) in stead of pass-by-value (byval) for struct
16702+ arguments in C ABI. Callee is responsible for allocating stack memory and
16703+ copying the value of the struct if modified. Note that the backend still
16704+ supports byval for struct arguments.
1670416705
1670516706On exit from a function:
1670616707
16707- 1 . VGPR0-31 and SGPR4-29 are used to pass function result arguments as
16708+ # . VGPR0-31 and SGPR4-29 are used to pass function result arguments as
1670816709 described below. Any registers used are considered clobbered registers.
16709- 2 . The following registers are preserved and have the same value as on entry:
16710+ # . The following registers are preserved and have the same value as on entry:
1671016711
1671116712 * FLAT_SCRATCH
1671216713 * EXEC
@@ -16741,10 +16742,10 @@ On exit from a function:
1674116742 preserved if it can be determined that the called function does not change
1674216743 their value.
1674316744
16744- 2 . The PC is set to the RA provided on entry.
16745- 3 . MODE register: *TBD*.
16746- 4 . All other registers are clobbered.
16747- 5 . Any necessary ``s_waitcnt`` has been performed to ensure memory accessed by
16745+ # . The PC is set to the RA provided on entry.
16746+ # . MODE register: *TBD*.
16747+ # . All other registers are clobbered.
16748+ # . Any necessary ``s_waitcnt`` has been performed to ensure memory accessed by
1674816749 function is available to the caller.
1674916750
1675016751.. TODO::
0 commit comments