Skip to content

Commit 0165e62

Browse files
kazutakahiratamahesh-attarde
authored andcommitted
[llvm] Proofread AMDGPUUsage.rst (llvm#150273)
1 parent f809d8e commit 0165e62

File tree

1 file changed

+18
-18
lines changed

1 file changed

+18
-18
lines changed

llvm/docs/AMDGPUUsage.rst

Lines changed: 18 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -677,7 +677,7 @@ the device used to execute the code match the features enabled when
677677
generating the code. A mismatch of features may result in incorrect
678678
execution, or a reduction in performance.
679679

680-
The target features supported by each processor is listed in
680+
The target features supported by each processor are listed in
681681
:ref:`amdgpu-processors`.
682682

683683
Target features are controlled by exactly one of the following Clang
@@ -783,7 +783,7 @@ description. The AMDGPU target specific information is:
783783
Is an AMDGPU processor or alternative processor name specified in
784784
:ref:`amdgpu-processor-table`. The non-canonical form target ID allows both
785785
the primary processor and alternative processor names. The canonical form
786-
target ID only allow the primary processor name.
786+
target ID only allows the primary processor name.
787787

788788
**target-feature**
789789
Is a target feature name specified in :ref:`amdgpu-target-features-table` that
@@ -793,7 +793,7 @@ description. The AMDGPU target specific information is:
793793
``--offload-arch``. Each target feature must appear at most once in a target
794794
ID. The non-canonical form target ID allows the target features to be
795795
specified in any order. The canonical form target ID requires the target
796-
features to be specified in alphabetic order.
796+
features to be specified in alphabetical order.
797797

798798
.. _amdgpu-target-id-v2-v3:
799799

@@ -886,7 +886,7 @@ supported for the ``amdgcn`` target.
886886
setup (see :ref:`amdgpu-amdhsa-kernel-prolog-m0`).
887887

888888
To convert between a private or group address space address (termed a segment
889-
address) and a flat address the base address of the corresponding aperture
889+
address) and a flat address, the base address of the corresponding aperture
890890
can be used. For GFX7-GFX8 these are available in the
891891
:ref:`amdgpu-amdhsa-hsa-aql-queue` the address of which can be obtained with
892892
Queue Ptr SGPR (see :ref:`amdgpu-amdhsa-initial-kernel-execution-state`). For
@@ -1186,7 +1186,7 @@ The AMDGPU backend implements the following LLVM IR intrinsics.
11861186
:ref:`llvm.stackrestore.p5 <int_stackrestore>` Implemented, must use the alloca address space.
11871187

11881188
:ref:`llvm.get.fpmode.i32 <int_get_fpmode>` The natural floating-point mode type is i32. This
1189-
implemented by extracting relevant bits out of the MODE
1189+
is implemented by extracting relevant bits out of the MODE
11901190
register with s_getreg_b32. The first 10 bits are the
11911191
core floating-point mode. Bits 12:18 are the exception
11921192
mask. On gfx9+, bit 23 is FP16_OVFL. Bitfields not
@@ -1266,14 +1266,14 @@ The AMDGPU backend implements the following LLVM IR intrinsics.
12661266

12671267
llvm.amdgcn.permlane16 Provides direct access to v_permlane16_b32. Performs arbitrary gather-style
12681268
operation within a row (16 contiguous lanes) of the second input operand.
1269-
The third and fourth inputs must be scalar values. these are combined into
1269+
The third and fourth inputs must be scalar values. These are combined into
12701270
a single 64-bit value representing lane selects used to swizzle within each
12711271
row. Currently implemented for i16, i32, float, half, bfloat, <2 x i16>,
12721272
<2 x half>, <2 x bfloat>, i64, double, pointers, multiples of the 32-bit vectors.
12731273

12741274
llvm.amdgcn.permlanex16 Provides direct access to v_permlanex16_b32. Performs arbitrary gather-style
12751275
operation across two rows of the second input operand (each row is 16 contiguous
1276-
lanes). The third and fourth inputs must be scalar values. these are combined
1276+
lanes). The third and fourth inputs must be scalar values. These are combined
12771277
into a single 64-bit value representing lane selects used to swizzle within each
12781278
row. Currently implemented for i16, i32, float, half, bfloat, <2 x i16>, <2 x half>,
12791279
<2 x bfloat>, i64, double, pointers, multiples of the 32-bit vectors.
@@ -1285,31 +1285,31 @@ The AMDGPU backend implements the following LLVM IR intrinsics.
12851285
32-bit vectors.
12861286

12871287
llvm.amdgcn.udot2 Provides direct access to v_dot2_u32_u16 across targets which
1288-
support such instructions. This performs unsigned dot product
1288+
support such instructions. This performs an unsigned dot product
12891289
with two v2i16 operands, summed with the third i32 operand. The
12901290
i1 fourth operand is used to clamp the output.
12911291

12921292
llvm.amdgcn.udot4 Provides direct access to v_dot4_u32_u8 across targets which
1293-
support such instructions. This performs unsigned dot product
1293+
support such instructions. This performs an unsigned dot product
12941294
with two i32 operands (holding a vector of 4 8bit values), summed
12951295
with the third i32 operand. The i1 fourth operand is used to clamp
12961296
the output.
12971297

12981298
llvm.amdgcn.udot8 Provides direct access to v_dot8_u32_u4 across targets which
1299-
support such instructions. This performs unsigned dot product
1299+
support such instructions. This performs an unsigned dot product
13001300
with two i32 operands (holding a vector of 8 4bit values), summed
13011301
with the third i32 operand. The i1 fourth operand is used to clamp
13021302
the output.
13031303

13041304
llvm.amdgcn.sdot2 Provides direct access to v_dot2_i32_i16 across targets which
1305-
support such instructions. This performs signed dot product
1305+
support such instructions. This performs a signed dot product
13061306
with two v2i16 operands, summed with the third i32 operand. The
13071307
i1 fourth operand is used to clamp the output.
13081308
When applicable (e.g. no clamping), this is lowered into
13091309
v_dot2c_i32_i16 for targets which support it.
13101310

13111311
llvm.amdgcn.sdot4 Provides direct access to v_dot4_i32_i8 across targets which
1312-
support such instructions. This performs signed dot product
1312+
support such instructions. This performs a signed dot product
13131313
with two i32 operands (holding a vector of 4 8bit values), summed
13141314
with the third i32 operand. The i1 fourth operand is used to clamp
13151315
the output.
@@ -1321,7 +1321,7 @@ The AMDGPU backend implements the following LLVM IR intrinsics.
13211321
of this instruction for gfx11 targets.
13221322

13231323
llvm.amdgcn.sdot8 Provides direct access to v_dot8_u32_u4 across targets which
1324-
support such instructions. This performs signed dot product
1324+
support such instructions. This performs a signed dot product
13251325
with two i32 operands (holding a vector of 8 4bit values), summed
13261326
with the third i32 operand. The i1 fourth operand is used to clamp
13271327
the output.
@@ -1401,7 +1401,7 @@ The AMDGPU backend implements the following LLVM IR intrinsics.
14011401

14021402
llvm.amdgcn.atomic.cond.sub.u32 Provides direct access to flat_atomic_cond_sub_u32, global_atomic_cond_sub_u32
14031403
and ds_cond_sub_u32 based on address space on gfx12 targets. This
1404-
performs subtraction only if the memory value is greater than or
1404+
performs a subtraction only if the memory value is greater than or
14051405
equal to the data value.
14061406

14071407
llvm.amdgcn.s.barrier.signal.isfirst Provides access to the s_barrier_signal_first instruction;
@@ -1646,7 +1646,7 @@ The AMDGPU backend supports the following LLVM IR attributes.
16461646
llvm.amdgcn.queue.ptr intrinsic. Note that unlike the other ABI hint
16471647
attributes, the queue pointer may be required in situations where the
16481648
intrinsic call does not directly appear in the program. Some subtargets
1649-
require the queue pointer for to handle some addrspacecasts, as well
1649+
require the queue pointer to handle some addrspacecasts, as well
16501650
as the llvm.amdgcn.is.shared, llvm.amdgcn.is.private, llvm.trap, and
16511651
llvm.debug intrinsics.
16521652

@@ -1947,7 +1947,7 @@ The following describes all emitted function resource usage symbols:
19471947
callees, contains an indirect call
19481948
===================================== ========= ========================================= ===============================================================================
19491949

1950-
Futhermore, three symbols are additionally emitted describing the compilation
1950+
Furthermore, three symbols are additionally emitted describing the compilation
19511951
unit's worst case (i.e, maxima) ``num_vgpr``, ``num_agpr``, and
19521952
``numbered_sgpr`` which may be referenced and used by the aforementioned
19531953
symbolic expressions. These three symbols are ``amdgcn.max_num_vgpr``,
@@ -17948,7 +17948,7 @@ set architecture (ISA) version of the assembly program.
1794817948
"AMD" and *arch* should always be equal to "AMDGPU".
1794917949

1795017950
By default, the assembler will derive the ISA version, *vendor*, and *arch*
17951-
from the value of the -mcpu option that is passed to the assembler.
17951+
from the value of the ``-mcpu`` option that is passed to the assembler.
1795217952

1795317953
.. _amdgpu-amdhsa-assembler-directive-amdgpu_hsa_kernel:
1795417954

@@ -17972,7 +17972,7 @@ default value for all keys is 0, with the following exceptions:
1797217972
- *amd_kernel_code_version_minor* defaults to 2.
1797317973
- *amd_machine_kind* defaults to 1.
1797417974
- *amd_machine_version_major*, *machine_version_minor*, and
17975-
*amd_machine_version_stepping* are derived from the value of the -mcpu option
17975+
*amd_machine_version_stepping* are derived from the value of the ``-mcpu`` option
1797617976
that is passed to the assembler.
1797717977
- *kernel_code_entry_byte_offset* defaults to 256.
1797817978
- *wavefront_size* defaults 6 for all targets before GFX10. For GFX10 onwards

0 commit comments

Comments
 (0)