@@ -677,7 +677,7 @@ the device used to execute the code match the features enabled when
677677generating the code. A mismatch of features may result in incorrect
678678execution, or a reduction in performance.
679679
680- The target features supported by each processor is listed in
680+ The target features supported by each processor are listed in
681681:ref:`amdgpu-processors`.
682682
683683Target features are controlled by exactly one of the following Clang
@@ -783,7 +783,7 @@ description. The AMDGPU target specific information is:
783783 Is an AMDGPU processor or alternative processor name specified in
784784 :ref:`amdgpu-processor-table`. The non-canonical form target ID allows both
785785 the primary processor and alternative processor names. The canonical form
786- target ID only allow the primary processor name.
786+ target ID only allows the primary processor name.
787787
788788**target-feature**
789789 Is a target feature name specified in :ref:`amdgpu-target-features-table` that
@@ -793,7 +793,7 @@ description. The AMDGPU target specific information is:
793793 ``--offload-arch``. Each target feature must appear at most once in a target
794794 ID. The non-canonical form target ID allows the target features to be
795795 specified in any order. The canonical form target ID requires the target
796- features to be specified in alphabetic order.
796+ features to be specified in alphabetical order.
797797
798798.. _amdgpu-target-id-v2-v3:
799799
@@ -886,7 +886,7 @@ supported for the ``amdgcn`` target.
886886 setup (see :ref:`amdgpu-amdhsa-kernel-prolog-m0`).
887887
888888 To convert between a private or group address space address (termed a segment
889- address) and a flat address the base address of the corresponding aperture
889+ address) and a flat address, the base address of the corresponding aperture
890890 can be used. For GFX7-GFX8 these are available in the
891891 :ref:`amdgpu-amdhsa-hsa-aql-queue` the address of which can be obtained with
892892 Queue Ptr SGPR (see :ref:`amdgpu-amdhsa-initial-kernel-execution-state`). For
@@ -1186,7 +1186,7 @@ The AMDGPU backend implements the following LLVM IR intrinsics.
11861186 :ref:`llvm.stackrestore.p5 <int_stackrestore>` Implemented, must use the alloca address space.
11871187
11881188 :ref:`llvm.get.fpmode.i32 <int_get_fpmode>` The natural floating-point mode type is i32. This
1189- implemented by extracting relevant bits out of the MODE
1189+ is implemented by extracting relevant bits out of the MODE
11901190 register with s_getreg_b32. The first 10 bits are the
11911191 core floating-point mode. Bits 12:18 are the exception
11921192 mask. On gfx9+, bit 23 is FP16_OVFL. Bitfields not
@@ -1266,14 +1266,14 @@ The AMDGPU backend implements the following LLVM IR intrinsics.
12661266
12671267 llvm.amdgcn.permlane16 Provides direct access to v_permlane16_b32. Performs arbitrary gather-style
12681268 operation within a row (16 contiguous lanes) of the second input operand.
1269- The third and fourth inputs must be scalar values. these are combined into
1269+ The third and fourth inputs must be scalar values. These are combined into
12701270 a single 64-bit value representing lane selects used to swizzle within each
12711271 row. Currently implemented for i16, i32, float, half, bfloat, <2 x i16>,
12721272 <2 x half>, <2 x bfloat>, i64, double, pointers, multiples of the 32-bit vectors.
12731273
12741274 llvm.amdgcn.permlanex16 Provides direct access to v_permlanex16_b32. Performs arbitrary gather-style
12751275 operation across two rows of the second input operand (each row is 16 contiguous
1276- lanes). The third and fourth inputs must be scalar values. these are combined
1276+ lanes). The third and fourth inputs must be scalar values. These are combined
12771277 into a single 64-bit value representing lane selects used to swizzle within each
12781278 row. Currently implemented for i16, i32, float, half, bfloat, <2 x i16>, <2 x half>,
12791279 <2 x bfloat>, i64, double, pointers, multiples of the 32-bit vectors.
@@ -1285,31 +1285,31 @@ The AMDGPU backend implements the following LLVM IR intrinsics.
12851285 32-bit vectors.
12861286
12871287 llvm.amdgcn.udot2 Provides direct access to v_dot2_u32_u16 across targets which
1288- support such instructions. This performs unsigned dot product
1288+ support such instructions. This performs an unsigned dot product
12891289 with two v2i16 operands, summed with the third i32 operand. The
12901290 i1 fourth operand is used to clamp the output.
12911291
12921292 llvm.amdgcn.udot4 Provides direct access to v_dot4_u32_u8 across targets which
1293- support such instructions. This performs unsigned dot product
1293+ support such instructions. This performs an unsigned dot product
12941294 with two i32 operands (holding a vector of 4 8bit values), summed
12951295 with the third i32 operand. The i1 fourth operand is used to clamp
12961296 the output.
12971297
12981298 llvm.amdgcn.udot8 Provides direct access to v_dot8_u32_u4 across targets which
1299- support such instructions. This performs unsigned dot product
1299+ support such instructions. This performs an unsigned dot product
13001300 with two i32 operands (holding a vector of 8 4bit values), summed
13011301 with the third i32 operand. The i1 fourth operand is used to clamp
13021302 the output.
13031303
13041304 llvm.amdgcn.sdot2 Provides direct access to v_dot2_i32_i16 across targets which
1305- support such instructions. This performs signed dot product
1305+ support such instructions. This performs a signed dot product
13061306 with two v2i16 operands, summed with the third i32 operand. The
13071307 i1 fourth operand is used to clamp the output.
13081308 When applicable (e.g. no clamping), this is lowered into
13091309 v_dot2c_i32_i16 for targets which support it.
13101310
13111311 llvm.amdgcn.sdot4 Provides direct access to v_dot4_i32_i8 across targets which
1312- support such instructions. This performs signed dot product
1312+ support such instructions. This performs a signed dot product
13131313 with two i32 operands (holding a vector of 4 8bit values), summed
13141314 with the third i32 operand. The i1 fourth operand is used to clamp
13151315 the output.
@@ -1321,7 +1321,7 @@ The AMDGPU backend implements the following LLVM IR intrinsics.
13211321 of this instruction for gfx11 targets.
13221322
13231323 llvm.amdgcn.sdot8 Provides direct access to v_dot8_u32_u4 across targets which
1324- support such instructions. This performs signed dot product
1324+ support such instructions. This performs a signed dot product
13251325 with two i32 operands (holding a vector of 8 4bit values), summed
13261326 with the third i32 operand. The i1 fourth operand is used to clamp
13271327 the output.
@@ -1401,7 +1401,7 @@ The AMDGPU backend implements the following LLVM IR intrinsics.
14011401
14021402 llvm.amdgcn.atomic.cond.sub.u32 Provides direct access to flat_atomic_cond_sub_u32, global_atomic_cond_sub_u32
14031403 and ds_cond_sub_u32 based on address space on gfx12 targets. This
1404- performs subtraction only if the memory value is greater than or
1404+ performs a subtraction only if the memory value is greater than or
14051405 equal to the data value.
14061406
14071407 llvm.amdgcn.s.barrier.signal.isfirst Provides access to the s_barrier_signal_first instruction;
@@ -1646,7 +1646,7 @@ The AMDGPU backend supports the following LLVM IR attributes.
16461646 llvm.amdgcn.queue.ptr intrinsic. Note that unlike the other ABI hint
16471647 attributes, the queue pointer may be required in situations where the
16481648 intrinsic call does not directly appear in the program. Some subtargets
1649- require the queue pointer for to handle some addrspacecasts, as well
1649+ require the queue pointer to handle some addrspacecasts, as well
16501650 as the llvm.amdgcn.is.shared, llvm.amdgcn.is.private, llvm.trap, and
16511651 llvm.debug intrinsics.
16521652
@@ -1947,7 +1947,7 @@ The following describes all emitted function resource usage symbols:
19471947 callees, contains an indirect call
19481948 ===================================== ========= ========================================= ===============================================================================
19491949
1950- Futhermore , three symbols are additionally emitted describing the compilation
1950+ Furthermore , three symbols are additionally emitted describing the compilation
19511951unit's worst case (i.e, maxima) ``num_vgpr``, ``num_agpr``, and
19521952``numbered_sgpr`` which may be referenced and used by the aforementioned
19531953symbolic expressions. These three symbols are ``amdgcn.max_num_vgpr``,
@@ -17948,7 +17948,7 @@ set architecture (ISA) version of the assembly program.
1794817948"AMD" and *arch* should always be equal to "AMDGPU".
1794917949
1795017950By default, the assembler will derive the ISA version, *vendor*, and *arch*
17951- from the value of the -mcpu option that is passed to the assembler.
17951+ from the value of the `` -mcpu`` option that is passed to the assembler.
1795217952
1795317953.. _amdgpu-amdhsa-assembler-directive-amdgpu_hsa_kernel:
1795417954
@@ -17972,7 +17972,7 @@ default value for all keys is 0, with the following exceptions:
1797217972- *amd_kernel_code_version_minor* defaults to 2.
1797317973- *amd_machine_kind* defaults to 1.
1797417974- *amd_machine_version_major*, *machine_version_minor*, and
17975- *amd_machine_version_stepping* are derived from the value of the -mcpu option
17975+ *amd_machine_version_stepping* are derived from the value of the `` -mcpu`` option
1797617976 that is passed to the assembler.
1797717977- *kernel_code_entry_byte_offset* defaults to 256.
1797817978- *wavefront_size* defaults 6 for all targets before GFX10. For GFX10 onwards
0 commit comments