[X86] Add tests showing failure to concat sqrt intrinsics together. #170096

RKSimon · 2025-12-01T10:48:34Z

Similar to fdiv, we should be trying to concat these high latency instructions together

llvmbot · 2025-12-01T10:49:15Z

@llvm/pr-subscribers-backend-x86

Author: Simon Pilgrim (RKSimon)

Changes

Similar to fdiv, we should be trying to concat these high latency instructions together

Full diff: https://github.com/llvm/llvm-project/pull/170096.diff

1 Files Affected:

(added) llvm/test/CodeGen/X86/combine-fsqrt.ll (+91)

diff --git a/llvm/test/CodeGen/X86/combine-fsqrt.ll b/llvm/test/CodeGen/X86/combine-fsqrt.ll
new file mode 100644
index 0000000000000..ddd7d3ac24315
--- /dev/null
+++ b/llvm/test/CodeGen/X86/combine-fsqrt.ll
@@ -0,0 +1,91 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mcpu=x86-64    | FileCheck %s --check-prefixes=SSE
+; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mcpu=x86-64-v2 | FileCheck %s --check-prefixes=SSE
+; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mcpu=sandybridge | FileCheck %s --check-prefixes=AVX,AVX1OR2
+; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mcpu=x86-64-v3 | FileCheck %s --check-prefixes=AVX,AVX1OR2
+; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mcpu=x86-64-v4 | FileCheck %s --check-prefixes=AVX,AVX512
+
+define <8 x float> @concat_sqrt_v8f32_v4f32(<4 x float> %a0, <4 x float> %a1) {
+; SSE-LABEL: concat_sqrt_v8f32_v4f32:
+; SSE:       # %bb.0:
+; SSE-NEXT:    sqrtps %xmm0, %xmm0
+; SSE-NEXT:    sqrtps %xmm1, %xmm1
+; SSE-NEXT:    retq
+;
+; AVX-LABEL: concat_sqrt_v8f32_v4f32:
+; AVX:       # %bb.0:
+; AVX-NEXT:    vsqrtps %xmm0, %xmm0
+; AVX-NEXT:    vsqrtps %xmm1, %xmm1
+; AVX-NEXT:    vinsertf128 $1, %xmm1, %ymm0, %ymm0
+; AVX-NEXT:    retq
+  %v0 = call <4 x float> @llvm.sqrt.v4f32(<4 x float> %a0)
+  %v1 = call <4 x float> @llvm.sqrt.v4f32(<4 x float> %a1)
+  %res  = shufflevector <4 x float> %v0, <4 x float> %v1, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
+  ret <8 x float> %res
+}
+
+define <16 x float> @concat_sqrt_v16f32_v4f32(<4 x float> %a0, <4 x float> %a1, <4 x float> %a2, <4 x float> %a3) {
+; SSE-LABEL: concat_sqrt_v16f32_v4f32:
+; SSE:       # %bb.0:
+; SSE-NEXT:    sqrtps %xmm0, %xmm0
+; SSE-NEXT:    sqrtps %xmm1, %xmm1
+; SSE-NEXT:    sqrtps %xmm2, %xmm2
+; SSE-NEXT:    sqrtps %xmm3, %xmm3
+; SSE-NEXT:    retq
+;
+; AVX1OR2-LABEL: concat_sqrt_v16f32_v4f32:
+; AVX1OR2:       # %bb.0:
+; AVX1OR2-NEXT:    vsqrtps %xmm0, %xmm0
+; AVX1OR2-NEXT:    vsqrtps %xmm1, %xmm1
+; AVX1OR2-NEXT:    vsqrtps %xmm2, %xmm2
+; AVX1OR2-NEXT:    vsqrtps %xmm3, %xmm3
+; AVX1OR2-NEXT:    vinsertf128 $1, %xmm1, %ymm0, %ymm0
+; AVX1OR2-NEXT:    vinsertf128 $1, %xmm3, %ymm2, %ymm1
+; AVX1OR2-NEXT:    retq
+;
+; AVX512-LABEL: concat_sqrt_v16f32_v4f32:
+; AVX512:       # %bb.0:
+; AVX512-NEXT:    vsqrtps %xmm0, %xmm0
+; AVX512-NEXT:    vsqrtps %xmm1, %xmm1
+; AVX512-NEXT:    vsqrtps %xmm2, %xmm2
+; AVX512-NEXT:    vsqrtps %xmm3, %xmm3
+; AVX512-NEXT:    vinsertf128 $1, %xmm3, %ymm2, %ymm2
+; AVX512-NEXT:    vinsertf128 $1, %xmm1, %ymm0, %ymm0
+; AVX512-NEXT:    vinsertf64x4 $1, %ymm2, %zmm0, %zmm0
+; AVX512-NEXT:    retq
+  %v0 = call <4 x float> @llvm.sqrt.v4f32(<4 x float> %a0)
+  %v1 = call <4 x float> @llvm.sqrt.v4f32(<4 x float> %a1)
+  %v2 = call <4 x float> @llvm.sqrt.v4f32(<4 x float> %a2)
+  %v3 = call <4 x float> @llvm.sqrt.v4f32(<4 x float> %a3)
+  %r01 = shufflevector <4 x float> %v0, <4 x float> %v1, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
+  %r23 = shufflevector <4 x float> %v2, <4 x float> %v3, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
+  %res  = shufflevector <8 x float> %r01, <8 x float> %r23, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
+  ret <16 x float> %res
+}
+
+define <16 x float> @concat_sqrt_v16f32_v8f32(<8 x float> %a0, <8 x float> %a1) {
+; SSE-LABEL: concat_sqrt_v16f32_v8f32:
+; SSE:       # %bb.0:
+; SSE-NEXT:    sqrtps %xmm0, %xmm0
+; SSE-NEXT:    sqrtps %xmm1, %xmm1
+; SSE-NEXT:    sqrtps %xmm2, %xmm2
+; SSE-NEXT:    sqrtps %xmm3, %xmm3
+; SSE-NEXT:    retq
+;
+; AVX1OR2-LABEL: concat_sqrt_v16f32_v8f32:
+; AVX1OR2:       # %bb.0:
+; AVX1OR2-NEXT:    vsqrtps %ymm0, %ymm0
+; AVX1OR2-NEXT:    vsqrtps %ymm1, %ymm1
+; AVX1OR2-NEXT:    retq
+;
+; AVX512-LABEL: concat_sqrt_v16f32_v8f32:
+; AVX512:       # %bb.0:
+; AVX512-NEXT:    vsqrtps %ymm0, %ymm0
+; AVX512-NEXT:    vsqrtps %ymm1, %ymm1
+; AVX512-NEXT:    vinsertf64x4 $1, %ymm1, %zmm0, %zmm0
+; AVX512-NEXT:    retq
+  %v0 = call <8 x float> @llvm.sqrt.v8f32(<8 x float> %a0)
+  %v1 = call <8 x float> @llvm.sqrt.v8f32(<8 x float> %a1)
+  %res  = shufflevector <8 x float> %v0, <8 x float> %v1, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
+  ret <16 x float> %res
+}

…lvm#170096) Similar to fdiv, we should be trying to concat these high latency instructions together

[X86] Add tests showing failure to concat sqrt intrinsics together.

1b17b68

Similar to fdiv, we should be trying to concat these high latency instructions together

llvmbot added the backend:X86 label Dec 1, 2025

RKSimon enabled auto-merge (squash) December 1, 2025 10:56

RKSimon merged commit 6c0a02f into llvm:main Dec 1, 2025
11 of 12 checks passed

RKSimon deleted the x86-concat-sqrt-tests branch December 1, 2025 11:36

aahrun pushed a commit to aahrun/llvm-project that referenced this pull request Dec 1, 2025

[X86] Add tests showing failure to concat sqrt intrinsics together. (l…

77ef6e4

…lvm#170096) Similar to fdiv, we should be trying to concat these high latency instructions together

augusto2112 pushed a commit to augusto2112/llvm-project that referenced this pull request Dec 3, 2025

[X86] Add tests showing failure to concat sqrt intrinsics together. (l…

71221e7

…lvm#170096) Similar to fdiv, we should be trying to concat these high latency instructions together

kcloudy0717 pushed a commit to kcloudy0717/llvm-project that referenced this pull request Dec 4, 2025

[X86] Add tests showing failure to concat sqrt intrinsics together. (l…

26deae4

…lvm#170096) Similar to fdiv, we should be trying to concat these high latency instructions together

honeygoyal pushed a commit to honeygoyal/llvm-project that referenced this pull request Dec 9, 2025

[X86] Add tests showing failure to concat sqrt intrinsics together. (l…

cce1b61

…lvm#170096) Similar to fdiv, we should be trying to concat these high latency instructions together

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[X86] Add tests showing failure to concat sqrt intrinsics together. #170096

[X86] Add tests showing failure to concat sqrt intrinsics together. #170096

Uh oh!

RKSimon commented Dec 1, 2025

Uh oh!

llvmbot commented Dec 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[X86] Add tests showing failure to concat sqrt intrinsics together. #170096

[X86] Add tests showing failure to concat sqrt intrinsics together. #170096

Uh oh!

Conversation

RKSimon commented Dec 1, 2025

Uh oh!

llvmbot commented Dec 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants