Skip to content

Commit 2ee5b89

Browse files
committed
s390x: enable S/DGEMM block with explicit loop unrolling + interleaving with clang
The code for SGEMM 16x4 and DGEMM 8x4 blocks on z14 and z15 uses explicit unrolling and interleaving to improve performance. The code employs an empty inline asm statement with operands that constrain the compiler's instruction scheduling and thereby enforce proper overlapping of load and compute phases. Fix an ifdef to apply that for clang builds, as well. Signed-off-by: Marius Hillenbrand <[email protected]>
1 parent 095f4e6 commit 2ee5b89

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

kernel/zarch/gemm_vec.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -393,7 +393,7 @@ static inline void GEBP_block_16_4(
393393
* Note that we need to massage this particular "barrier"
394394
* depending on the gcc version.
395395
*/
396-
#if __GNUC__ > 7
396+
#if __GNUC__ > 7 || defined(__clang__)
397397
#define BARRIER_READ_BEFORE_COMPUTE(SUFFIX) \
398398
do { \
399399
asm("" \

0 commit comments

Comments
 (0)