@@ -20238,18 +20238,31 @@ Overview:
2023820238"""""""""
2023920239
2024020240The '``llvm.vector.experimental.partial.reduce.add.*``' intrinsics reduce the
20241- concatenation of the two vector operands down to the number of elements dictated
20242- by the result type. The result type is a vector type that matches the type of the
20243- first operand vector.
20241+ concatenation of the two vector arguments down to the number of elements of the
20242+ result vector type.
2024420243
2024520244Arguments:
2024620245""""""""""
2024720246
20248- Both arguments must be vectors of matching element types. The first argument type must
20249- match the result type, while the second argument type must have a vector length that is a
20250- positive integer multiple of the first vector/result type. The arguments must be either be
20251- both fixed or both scalable vectors.
20247+ The first argument is an integer vector with the same type as the result.
2025220248
20249+ The second argument is a vector with a length that is a known integer multiple
20250+ of the result's type, while maintaining the same element type.
20251+
20252+ Semantics:
20253+ """"""""""
20254+
20255+ Other than the reduction operator (e.g. add) the way in which the concatinated
20256+ arguments is reduced is entirely unspecified. By their nature these intrinsics
20257+ are not expected to be useful in isolation but instead implement the first phase
20258+ of an overall reduction operation.
20259+
20260+ The typical use case is loop vectorization where reductions are split into an
20261+ in-loop phase, where maintaining an unordered vector result is important for
20262+ performance, and an out-of-loop phase to calculate the final scalar result.
20263+
20264+ By not introducing any new ordering constraints these intrinsics maximize the
20265+ abilitity to utilise a target's accumulation instructions.
2025320266
2025420267'``llvm.experimental.vector.histogram.*``' Intrinsic
2025520268^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
0 commit comments