Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions llvm/test/Analysis/CostModel/AMDGPU/maximum.ll
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=amdgcn-unknown-amdhsa -mcpu=gfx90a -mattr=+half-rate-64-ops < %s | FileCheck -check-prefixes=ALL,GFX9,GFX90A-FASTF64 %s
; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=amdgcn-unknown-amdhsa -mcpu=gfx900 -mattr=+half-rate-64-ops < %s | FileCheck -check-prefixes=ALL,GFX9,FASTF64 %s
; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=amdgcn-unknown-amdhsa -mattr=-half-rate-64-ops < %s | FileCheck -check-prefixes=ALL,SLOWF64 %s
; RUN: opt -passes="print<cost-model>" -cost-kind=code-size 2>&1 -disable-output -mtriple=amdgcn-unknown-amdhsa -mcpu=gfx950 -mattr=+half-rate-64-ops < %s | FileCheck -check-prefixes=SIZE,GFX950-SIZE %s
; RUN: opt -passes="print<cost-model>" -cost-kind=code-size 2>&1 -disable-output -mtriple=amdgcn-unknown-amdhsa -mcpu=gfx90a -mattr=+half-rate-64-ops < %s | FileCheck -check-prefixes=SIZE,GFX9-SIZE,GFX90A-SIZE %s
; RUN: opt -passes="print<cost-model>" -cost-kind=code-size 2>&1 -disable-output -mtriple=amdgcn-unknown-amdhsa -mcpu=gfx900 -mattr=+half-rate-64-ops < %s | FileCheck -check-prefixes=SIZE,GFX9-SIZE %s
; RUN: opt -passes="print<cost-model>" -cost-kind=code-size 2>&1 -disable-output -mtriple=amdgcn-unknown-amdhsa -mattr=-half-rate-64-ops < %s | FileCheck -check-prefixes=SIZE,SLOW-SIZE %s
Expand Down Expand Up @@ -35,6 +36,15 @@ define void @maximum_f16() {
; SLOWF64-NEXT: Cost Model: Found an estimated cost of 176 for instruction: %v16f16 = call <16 x half> @llvm.maximum.v16f16(<16 x half> undef, <16 x half> undef)
; SLOWF64-NEXT: Cost Model: Found an estimated cost of 10 for instruction: ret void
;
; GFX950-SIZE-LABEL: 'maximum_f16'
; GFX950-SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %f16 = call half @llvm.maximum.f16(half undef, half undef)
; GFX950-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2f16 = call <2 x half> @llvm.maximum.v2f16(<2 x half> undef, <2 x half> undef)
; GFX950-SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v3f16 = call <3 x half> @llvm.maximum.v3f16(<3 x half> undef, <3 x half> undef)
; GFX950-SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v4f16 = call <4 x half> @llvm.maximum.v4f16(<4 x half> undef, <4 x half> undef)
; GFX950-SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v8f16 = call <8 x half> @llvm.maximum.v8f16(<8 x half> undef, <8 x half> undef)
; GFX950-SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v16f16 = call <16 x half> @llvm.maximum.v16f16(<16 x half> undef, <16 x half> undef)
Comment on lines +40 to +45
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These values look broken. 1 element is more expensive than 2, and every other vector size is also size 2

Copy link
Collaborator Author

@rampitec rampitec May 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, but that's what we have. This is also exactly that in downstream. I'd say we need to land tests and then fix it separately.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is because fmaximum.f16 is custom and fmaximum.v2f16 is legal. Generic code multiplies cost by 2 for custom operations. And that is no so wrong... Our tests:

define <2 x half> @v_maximum_v2f16(<2 x half> %src0, <2 x half> %src1) {
; GFX950-LABEL: v_maximum_v2f16:
; GFX950:       ; %bb.0:
; GFX950-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX950-NEXT:    v_pk_maximum3_f16 v0, v0, v1, v1
; GFX950-NEXT:    s_setpc_b64 s[30:31]

  %op = call <2 x half> @llvm.maximum.v2f16(<2 x half> %src0, <2 x half> %src1)
  ret <2 x half> %op
}

define half @v_maximum_f16(half %src0, half %src1) {
; GFX950-LABEL: v_maximum_f16:
; GFX950:       ; %bb.0:
; GFX950-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX950-NEXT:    v_max_f16_e32 v2, v0, v1
; GFX950-NEXT:    v_mov_b32_e32 v3, 0x7e00
; GFX950-NEXT:    v_cmp_o_f16_e32 vcc, v0, v1
; GFX950-NEXT:    s_nop 1
; GFX950-NEXT:    v_cndmask_b32_e32 v0, v3, v2, vcc
; GFX950-NEXT:    s_setpc_b64 s[30:31]
  %op = call half @llvm.maximum.f16(half %src0, half %src1)
  ret half %op
}

I'd say the cost shall be 4 here, not even 2.

v8 and v16 cases are wrong of course, it shall be 4 and 8 respectively, but we do not handle it anywhere ourselves.

; GFX950-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
;
; GFX9-SIZE-LABEL: 'maximum_f16'
; GFX9-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %f16 = call half @llvm.maximum.f16(half undef, half undef)
; GFX9-SIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v2f16 = call <2 x half> @llvm.maximum.v2f16(<2 x half> undef, <2 x half> undef)
Expand Down Expand Up @@ -98,6 +108,15 @@ define void @maximum_bf16() {
; SLOWF64-NEXT: Cost Model: Found an estimated cost of 176 for instruction: %v16bf16 = call <16 x bfloat> @llvm.maximum.v16bf16(<16 x bfloat> undef, <16 x bfloat> undef)
; SLOWF64-NEXT: Cost Model: Found an estimated cost of 10 for instruction: ret void
;
; GFX950-SIZE-LABEL: 'maximum_bf16'
; GFX950-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %bf16 = call bfloat @llvm.maximum.bf16(bfloat undef, bfloat undef)
; GFX950-SIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v2bf16 = call <2 x bfloat> @llvm.maximum.v2bf16(<2 x bfloat> undef, <2 x bfloat> undef)
; GFX950-SIZE-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %v3bf16 = call <3 x bfloat> @llvm.maximum.v3bf16(<3 x bfloat> undef, <3 x bfloat> undef)
; GFX950-SIZE-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %v4bf16 = call <4 x bfloat> @llvm.maximum.v4bf16(<4 x bfloat> undef, <4 x bfloat> undef)
; GFX950-SIZE-NEXT: Cost Model: Found an estimated cost of 15 for instruction: %v8bf16 = call <8 x bfloat> @llvm.maximum.v8bf16(<8 x bfloat> undef, <8 x bfloat> undef)
; GFX950-SIZE-NEXT: Cost Model: Found an estimated cost of 31 for instruction: %v16bf16 = call <16 x bfloat> @llvm.maximum.v16bf16(<16 x bfloat> undef, <16 x bfloat> undef)
; GFX950-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
;
; GFX9-SIZE-LABEL: 'maximum_bf16'
; GFX9-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %bf16 = call bfloat @llvm.maximum.bf16(bfloat undef, bfloat undef)
; GFX9-SIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v2bf16 = call <2 x bfloat> @llvm.maximum.v2bf16(<2 x bfloat> undef, <2 x bfloat> undef)
Expand Down
19 changes: 19 additions & 0 deletions llvm/test/Analysis/CostModel/AMDGPU/minimum.ll
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=amdgcn-unknown-amdhsa -mcpu=gfx90a -mattr=+half-rate-64-ops < %s | FileCheck -check-prefixes=ALL,GFX9,GFX90A-FASTF64 %s
; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=amdgcn-unknown-amdhsa -mcpu=gfx900 -mattr=+half-rate-64-ops < %s | FileCheck -check-prefixes=ALL,GFX9,FASTF64 %s
; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=amdgcn-unknown-amdhsa -mattr=-half-rate-64-ops < %s | FileCheck -check-prefixes=ALL,SLOWF64 %s
; RUN: opt -passes="print<cost-model>" -cost-kind=code-size 2>&1 -disable-output -mtriple=amdgcn-unknown-amdhsa -mcpu=gfx950 -mattr=+half-rate-64-ops < %s | FileCheck -check-prefixes=SIZE,GFX950-SIZE %s
; RUN: opt -passes="print<cost-model>" -cost-kind=code-size 2>&1 -disable-output -mtriple=amdgcn-unknown-amdhsa -mcpu=gfx90a -mattr=+half-rate-64-ops < %s | FileCheck -check-prefixes=SIZE,GFX9-SIZE,GFX90A-SIZE %s
; RUN: opt -passes="print<cost-model>" -cost-kind=code-size 2>&1 -disable-output -mtriple=amdgcn-unknown-amdhsa -mcpu=gfx900 -mattr=+half-rate-64-ops < %s | FileCheck -check-prefixes=SIZE,GFX9-SIZE %s
; RUN: opt -passes="print<cost-model>" -cost-kind=code-size 2>&1 -disable-output -mtriple=amdgcn-unknown-amdhsa -mattr=-half-rate-64-ops < %s | FileCheck -check-prefixes=SIZE,SLOW-SIZE %s
Expand Down Expand Up @@ -35,6 +36,15 @@ define void @minimum_f16() {
; SLOWF64-NEXT: Cost Model: Found an estimated cost of 176 for instruction: %v16f16 = call <16 x half> @llvm.minimum.v16f16(<16 x half> undef, <16 x half> undef)
; SLOWF64-NEXT: Cost Model: Found an estimated cost of 10 for instruction: ret void
;
; GFX950-SIZE-LABEL: 'minimum_f16'
; GFX950-SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %f16 = call half @llvm.minimum.f16(half undef, half undef)
; GFX950-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2f16 = call <2 x half> @llvm.minimum.v2f16(<2 x half> undef, <2 x half> undef)
; GFX950-SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v3f16 = call <3 x half> @llvm.minimum.v3f16(<3 x half> undef, <3 x half> undef)
; GFX950-SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v4f16 = call <4 x half> @llvm.minimum.v4f16(<4 x half> undef, <4 x half> undef)
; GFX950-SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v8f16 = call <8 x half> @llvm.minimum.v8f16(<8 x half> undef, <8 x half> undef)
; GFX950-SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v16f16 = call <16 x half> @llvm.minimum.v16f16(<16 x half> undef, <16 x half> undef)
; GFX950-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
;
; GFX9-SIZE-LABEL: 'minimum_f16'
; GFX9-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %f16 = call half @llvm.minimum.f16(half undef, half undef)
; GFX9-SIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v2f16 = call <2 x half> @llvm.minimum.v2f16(<2 x half> undef, <2 x half> undef)
Expand Down Expand Up @@ -98,6 +108,15 @@ define void @minimum_bf16() {
; SLOWF64-NEXT: Cost Model: Found an estimated cost of 176 for instruction: %v16bf16 = call <16 x bfloat> @llvm.minimum.v16bf16(<16 x bfloat> undef, <16 x bfloat> undef)
; SLOWF64-NEXT: Cost Model: Found an estimated cost of 10 for instruction: ret void
;
; GFX950-SIZE-LABEL: 'minimum_bf16'
; GFX950-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %bf16 = call bfloat @llvm.minimum.bf16(bfloat undef, bfloat undef)
; GFX950-SIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v2bf16 = call <2 x bfloat> @llvm.minimum.v2bf16(<2 x bfloat> undef, <2 x bfloat> undef)
; GFX950-SIZE-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %v3bf16 = call <3 x bfloat> @llvm.minimum.v3bf16(<3 x bfloat> undef, <3 x bfloat> undef)
; GFX950-SIZE-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %v4bf16 = call <4 x bfloat> @llvm.minimum.v4bf16(<4 x bfloat> undef, <4 x bfloat> undef)
; GFX950-SIZE-NEXT: Cost Model: Found an estimated cost of 15 for instruction: %v8bf16 = call <8 x bfloat> @llvm.minimum.v8bf16(<8 x bfloat> undef, <8 x bfloat> undef)
; GFX950-SIZE-NEXT: Cost Model: Found an estimated cost of 31 for instruction: %v16bf16 = call <16 x bfloat> @llvm.minimum.v16bf16(<16 x bfloat> undef, <16 x bfloat> undef)
; GFX950-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
;
; GFX9-SIZE-LABEL: 'minimum_bf16'
; GFX9-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %bf16 = call bfloat @llvm.minimum.bf16(bfloat undef, bfloat undef)
; GFX9-SIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v2bf16 = call <2 x bfloat> @llvm.minimum.v2bf16(<2 x bfloat> undef, <2 x bfloat> undef)
Expand Down
Loading