@@ -18,8 +18,8 @@ This document describes how the SME ACLE attributes map to LLVM IR
1818attributes and how LLVM lowers these attributes to implement the rules and
1919requirements of the ABI.
2020
21- Below we describe the LLVM IR attributes and their relation to the C/C++
22- level ACLE attributes:
21+ Below, we describe the LLVM IR attributes and their relation to the
22+ C/C++- level ACLE attributes:
2323
2424``aarch64_pstate_sm_enabled ``
2525 is used for functions with ``__arm_streaming ``
@@ -51,8 +51,8 @@ level ACLE attributes:
5151
5252Clang must ensure that the above attributes are added both to the
5353function's declaration/definition as well as to their call-sites. This is
54- important for calls to attributed function pointers, where there is no
55- definition or declaration available.
54+ important for calls to attributed function pointers, where no
55+ definition or declaration is available.
5656
5757
58582. Handling PSTATE.SM
@@ -77,7 +77,7 @@ and almost all parts of CodeGen we can assume that the runtime value for
7777``vscale `` does not. If we let the compiler insert the appropriate ``smstart ``
7878and ``smstop `` instructions around call boundaries, then the effects on SVE
7979state can be mitigated. By limiting the state changes to a very brief window
80- around the call we can control how the operations are scheduled and how live
80+ around the call, we can control how the operations are scheduled and how live
8181values remain preserved between state transitions.
8282
8383In order to control PSTATE.SM at this level of granularity, we use function and
@@ -89,7 +89,7 @@ Restrictions on attributes
8989
9090* It is undefined behaviour to pass or return (pointers to) scalable vector
9191 objects to/from functions which may use a different SVE vector length.
92- This includes functions with a non-streaming interface, but marked with
92+ This includes functions with a non-streaming interface but marked with
9393 ``aarch64_pstate_sm_body ``.
9494
9595* It is not allowed for a function to be decorated with both
@@ -100,7 +100,7 @@ Restrictions on attributes
100100 ``aarch64_new_za ``, ``aarch64_in_za ``, ``aarch64_out_za ``, ``aarch64_inout_za ``,
101101 ``aarch64_preserves_za ``.
102102
103- These restrictions also apply in the higher level SME ACLE, which means we can
103+ These restrictions also apply in the higher- level SME ACLE, which means we can
104104emit diagnostics in Clang to signal users about incorrect behaviour.
105105
106106
@@ -224,7 +224,7 @@ The ``COND_SMSTART/COND_SMSTOP`` nodes additionally take ``CurrentState`` and
224224
225225When ``CurrentState `` and ``ExpectedState `` can be evaluated at compile-time
226226(i.e. they are both constants) then an unconditional ``smstart/smstop ``
227- instruction is emitted. Otherwise the node is matched to a Pseudo instruction
227+ instruction is emitted. Otherwise, the node is matched to a Pseudo instruction
228228which expands to a compare/branch and a ``smstart/smstop ``. This is necessary to
229229implement transitions from ``SC -> N `` and ``SC -> S ``.
230230
@@ -236,7 +236,7 @@ streaming compatible, the compiler has to insert a SMSTOP before the call and
236236insert a SMSTOP after the call.
237237
238238If the function that is called is an intrinsic with no side-effects which in
239- turn is lowered to a function call (e.g. ``@llvm.cos() ``), then the call to
239+ turn is lowered to a function call (e.g., ``@llvm.cos() ``), then the call to
240240``@llvm.cos() `` is not part of any Chain; it can be scheduled freely.
241241
242242Lowering of a Callsite creates a small chain of nodes which:
@@ -297,11 +297,11 @@ To ensure we use the correct SVE vector length to allocate the locals with, we
297297can use the streaming vector-length to allocate the stack-slots through the
298298``ADDSVL `` instruction, even when the CPU is not yet in streaming mode.
299299
300- This only works for locals and not callee-save slots, since LLVM doesn't support
300+ This works only for locals and not callee-save slots, since LLVM doesn't support
301301mixing two different scalable vector lengths in one stack frame. That means that the
302302case where a function is marked ``arm_locally_streaming `` and needs to spill SVE
303303callee-saves in the prologue is currently unsupported. However, it is unlikely
304- for this to happen without user intervention, because ``arm_locally_streaming ``
304+ for this to happen without user intervention because ``arm_locally_streaming ``
305305functions cannot take or return vector-length-dependent values. This would otherwise
306306require forcing both the SVE PCS using '``aarch64_sve_pcs ``' combined with using
307307``arm_locally_streaming `` in order to encounter this problem. This combination
@@ -330,7 +330,7 @@ attributed with ``arm_locally_streaming``:
330330 return array[N - 1] + arg;
331331 }
332332
333- should use ADDSVL for allocating the stack space and should avoid clobbering
333+ should use `` ADDSVL `` for allocating the stack space and should avoid clobbering
334334the return/argument values.
335335
336336.. code-block :: none
@@ -381,17 +381,17 @@ Preventing the use of illegal instructions in Streaming Mode
381381* When executing a program in normal mode (PSTATE.SM=0), a subset of SME
382382 instructions are invalid.
383383
384- * Streaming-compatible functions must only use instructions that are valid when
384+ * Streaming-compatible functions must use only instructions that are valid when
385385 either PSTATE.SM=0 or PSTATE.SM=1.
386386
387387The value of PSTATE.SM is not controlled by the feature flags, but rather by the
388- function attributes. This means that we can compile for '``+sme ``' and the compiler
388+ function attributes. This means that we can compile for '``+sme ``', and the compiler
389389will code-generate any instructions, even if they are not legal under the requested
390390streaming mode. The compiler needs to use the function attributes to ensure the
391391compiler doesn't do transformations under the assumption that certain operations
392392are available at runtime.
393393
394- We made a conscious choice not to model this with feature flags, because we
394+ We made a conscious choice not to model this with feature flags because we
395395still want to support inline-asm in either mode (with the user placing
396396smstart/smstop manually), and this became rather complicated to implement at the
397397individual instruction level (see `D120261 <https://reviews.llvm.org/D120261 >`_
@@ -408,7 +408,7 @@ auto-vectorization with a subset of streaming-compatible instructions, but that
408408requires changes to the CostModel, Legalization and SelectionDAG lowering.
409409
410410We will also emit diagnostics in Clang to prevent the use of
411- non-streaming(-compatible) operations, e.g. through ACLE intrinsics, when a
411+ non-streaming(-compatible) operations, e.g., through ACLE intrinsics, when a
412412function is decorated with the streaming mode attributes.
413413
414414
@@ -456,7 +456,7 @@ AArch64 Predicate-as-Counter Type
456456:Overview:
457457
458458The predicate-as-counter type represents the type of a predicate-as-counter
459- value held in a AArch64 SVE predicate register. Such a value contains
459+ value held in an AArch64 SVE predicate register. Such a value contains
460460information about the number of active lanes, the element width and a bit that
461461tells whether the generated mask should be inverted. ACLE intrinsics should be
462462used to move the predicate-as-counter value to/from a predicate vector.
0 commit comments