-
Couldn't load subscription status.
- Fork 15k
[Driver][AMDGPU][HIP][SPIRV] Disable optimizations for AMDGCN SPIR-V #154765
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… HIPSTDPAR runtime component.
|
@llvm/pr-subscribers-backend-amdgpu Author: Alex Voicu (AlexVlx) ChangesSPIR-V specific optimizations can inadvertently remove information that is important for the AMDGPU BE / break certain code patterns we rely on. Therefore, for AMDGCN flavoured SPIR-V we disable optimizations over IR, to ensure that we operate directly on the output of Clang CodeGen when we finalise. Full diff: https://github.com/llvm/llvm-project/pull/154765.diff 4 Files Affected:
diff --git a/clang/lib/Driver/ToolChains/AMDGPU.cpp b/clang/lib/Driver/ToolChains/AMDGPU.cpp
index 5e463b9c98687..bad1bd694b9e7 100644
--- a/clang/lib/Driver/ToolChains/AMDGPU.cpp
+++ b/clang/lib/Driver/ToolChains/AMDGPU.cpp
@@ -858,6 +858,13 @@ void AMDGPUToolChain::addClangTargetOptions(
CC1Args.push_back("-fapply-global-visibility-to-externs");
}
+ // For SPIR-V we want to retain the pristine output of Clang CodeGen, since
+ // optimizations might lose structure / information that is necessary for
+ // generating optimal concrete AMDGPU code.
+ if (getTriple().isSPIRV() &&
+ !DriverArgs.hasArg(options::OPT_disable_llvm_passes))
+ CC1Args.push_back("-disable-llvm-passes");
+
if (DeviceOffloadingKind == Action::OFK_None)
addOpenCLBuiltinsLib(getDriver(), DriverArgs, CC1Args);
}
diff --git a/clang/lib/Driver/ToolChains/HIPAMD.cpp b/clang/lib/Driver/ToolChains/HIPAMD.cpp
index b4c6da0d73d13..5f3fbea40f162 100644
--- a/clang/lib/Driver/ToolChains/HIPAMD.cpp
+++ b/clang/lib/Driver/ToolChains/HIPAMD.cpp
@@ -261,6 +261,12 @@ void HIPAMDToolChain::addClangTargetOptions(
// with options that match the user-supplied ones.
if (!DriverArgs.hasArg(options::OPT_fembed_bitcode_marker))
CC1Args.push_back("-fembed-bitcode=marker");
+ // For SPIR-V we want to retain the pristine output of Clang CodeGen, since
+ // optimizations might lose structure / information that is necessary for
+ // generating optimal concrete AMDGPU code. We duplicate this because the
+ // HIP TC doesn't invoke the base AMDGPU TC addClangTargetOptions.
+ if (!DriverArgs.hasArg(options::OPT_disable_llvm_passes))
+ CC1Args.push_back("-disable-llvm-passes");
return; // No DeviceLibs for SPIR-V.
}
diff --git a/clang/test/Driver/hip-toolchain-no-rdc.hip b/clang/test/Driver/hip-toolchain-no-rdc.hip
index ddd251b67cc57..dc8f0a97ad371 100644
--- a/clang/test/Driver/hip-toolchain-no-rdc.hip
+++ b/clang/test/Driver/hip-toolchain-no-rdc.hip
@@ -206,7 +206,7 @@
// Check mixed AMDGCNSPIRV and concrete GPU arch.
//
-// AMDGCNSPIRV: "-cc1" "-triple" "spirv64-amd-amdhsa" {{.*}}"-emit-llvm-bc" {{.*}} "-o" "[[AMDGCNSPV_BC:.*bc]]"
+// AMDGCNSPIRV: "-cc1" "-triple" "spirv64-amd-amdhsa" {{.*}}"-emit-llvm-bc" {{.*}}"-fembed-bitcode=marker" "-disable-llvm-passes" {{.*}} "-o" "[[AMDGCNSPV_BC:.*bc]]"
// AMDGCNSPIRV: {{".*llvm-link.*"}} "-o" "[[AMDGCNSPV_TMP:.*out]]" "[[AMDGCNSPV_BC]]"
// AMDGCNSPIRV: {{".*llvm-spirv.*"}} "--spirv-max-version=1.6" "--spirv-ext=+all" {{.*}} "[[AMDGCNSPV_TMP]]" {{.*}}"-o" "[[AMDGCNSPV_CO:.*out]]"
// AMDGCNSPIRV: "-cc1" "-triple" "amdgcn-amd-amdhsa" {{.*}}"-emit-obj" {{.*}}"-target-cpu" "gfx900"{{.*}} "-o" "[[GFX900_OBJ:.*o]]"
diff --git a/clang/test/Driver/spirv-amd-toolchain.c b/clang/test/Driver/spirv-amd-toolchain.c
index c9417400a9378..0c8ccd1d17487 100644
--- a/clang/test/Driver/spirv-amd-toolchain.c
+++ b/clang/test/Driver/spirv-amd-toolchain.c
@@ -14,6 +14,6 @@
// RUN: %clang -### --target=spirv64-amd-amdhsa %s -nogpulib -nogpuinc 2>&1 \
// RUN: | FileCheck %s --check-prefix=INVOCATION
-// INVOCATION: "-cc1" "-triple" "spirv64-amd-amdhsa" {{.*}} "-o" "[[OUTPUT:.+]]" "-x" "c"
+// INVOCATION: "-cc1" "-triple" "spirv64-amd-amdhsa" {{.*}}"-disable-llvm-passes" {{.*}} "-o" "[[OUTPUT:.+]]" "-x" "c"
// INVOCATION: "{{.*}}llvm-link" "-o" "a.out" "[[OUTPUT]]"
// INVOCATION: "{{.*}}llvm-spirv" "--spirv-max-version=1.6" "--spirv-ext=+all" "--spirv-allow-unknown-intrinsics" "--spirv-lower-const-expr" "--spirv-preserve-auxdata" "--spirv-debug-info-version=nonsemantic-shader-200" "a.out" "-o" "a.out"
|
|
@llvm/pr-subscribers-clang Author: Alex Voicu (AlexVlx) ChangesSPIR-V specific optimizations can inadvertently remove information that is important for the AMDGPU BE / break certain code patterns we rely on. Therefore, for AMDGCN flavoured SPIR-V we disable optimizations over IR, to ensure that we operate directly on the output of Clang CodeGen when we finalise. Full diff: https://github.com/llvm/llvm-project/pull/154765.diff 4 Files Affected:
diff --git a/clang/lib/Driver/ToolChains/AMDGPU.cpp b/clang/lib/Driver/ToolChains/AMDGPU.cpp
index 5e463b9c98687..bad1bd694b9e7 100644
--- a/clang/lib/Driver/ToolChains/AMDGPU.cpp
+++ b/clang/lib/Driver/ToolChains/AMDGPU.cpp
@@ -858,6 +858,13 @@ void AMDGPUToolChain::addClangTargetOptions(
CC1Args.push_back("-fapply-global-visibility-to-externs");
}
+ // For SPIR-V we want to retain the pristine output of Clang CodeGen, since
+ // optimizations might lose structure / information that is necessary for
+ // generating optimal concrete AMDGPU code.
+ if (getTriple().isSPIRV() &&
+ !DriverArgs.hasArg(options::OPT_disable_llvm_passes))
+ CC1Args.push_back("-disable-llvm-passes");
+
if (DeviceOffloadingKind == Action::OFK_None)
addOpenCLBuiltinsLib(getDriver(), DriverArgs, CC1Args);
}
diff --git a/clang/lib/Driver/ToolChains/HIPAMD.cpp b/clang/lib/Driver/ToolChains/HIPAMD.cpp
index b4c6da0d73d13..5f3fbea40f162 100644
--- a/clang/lib/Driver/ToolChains/HIPAMD.cpp
+++ b/clang/lib/Driver/ToolChains/HIPAMD.cpp
@@ -261,6 +261,12 @@ void HIPAMDToolChain::addClangTargetOptions(
// with options that match the user-supplied ones.
if (!DriverArgs.hasArg(options::OPT_fembed_bitcode_marker))
CC1Args.push_back("-fembed-bitcode=marker");
+ // For SPIR-V we want to retain the pristine output of Clang CodeGen, since
+ // optimizations might lose structure / information that is necessary for
+ // generating optimal concrete AMDGPU code. We duplicate this because the
+ // HIP TC doesn't invoke the base AMDGPU TC addClangTargetOptions.
+ if (!DriverArgs.hasArg(options::OPT_disable_llvm_passes))
+ CC1Args.push_back("-disable-llvm-passes");
return; // No DeviceLibs for SPIR-V.
}
diff --git a/clang/test/Driver/hip-toolchain-no-rdc.hip b/clang/test/Driver/hip-toolchain-no-rdc.hip
index ddd251b67cc57..dc8f0a97ad371 100644
--- a/clang/test/Driver/hip-toolchain-no-rdc.hip
+++ b/clang/test/Driver/hip-toolchain-no-rdc.hip
@@ -206,7 +206,7 @@
// Check mixed AMDGCNSPIRV and concrete GPU arch.
//
-// AMDGCNSPIRV: "-cc1" "-triple" "spirv64-amd-amdhsa" {{.*}}"-emit-llvm-bc" {{.*}} "-o" "[[AMDGCNSPV_BC:.*bc]]"
+// AMDGCNSPIRV: "-cc1" "-triple" "spirv64-amd-amdhsa" {{.*}}"-emit-llvm-bc" {{.*}}"-fembed-bitcode=marker" "-disable-llvm-passes" {{.*}} "-o" "[[AMDGCNSPV_BC:.*bc]]"
// AMDGCNSPIRV: {{".*llvm-link.*"}} "-o" "[[AMDGCNSPV_TMP:.*out]]" "[[AMDGCNSPV_BC]]"
// AMDGCNSPIRV: {{".*llvm-spirv.*"}} "--spirv-max-version=1.6" "--spirv-ext=+all" {{.*}} "[[AMDGCNSPV_TMP]]" {{.*}}"-o" "[[AMDGCNSPV_CO:.*out]]"
// AMDGCNSPIRV: "-cc1" "-triple" "amdgcn-amd-amdhsa" {{.*}}"-emit-obj" {{.*}}"-target-cpu" "gfx900"{{.*}} "-o" "[[GFX900_OBJ:.*o]]"
diff --git a/clang/test/Driver/spirv-amd-toolchain.c b/clang/test/Driver/spirv-amd-toolchain.c
index c9417400a9378..0c8ccd1d17487 100644
--- a/clang/test/Driver/spirv-amd-toolchain.c
+++ b/clang/test/Driver/spirv-amd-toolchain.c
@@ -14,6 +14,6 @@
// RUN: %clang -### --target=spirv64-amd-amdhsa %s -nogpulib -nogpuinc 2>&1 \
// RUN: | FileCheck %s --check-prefix=INVOCATION
-// INVOCATION: "-cc1" "-triple" "spirv64-amd-amdhsa" {{.*}} "-o" "[[OUTPUT:.+]]" "-x" "c"
+// INVOCATION: "-cc1" "-triple" "spirv64-amd-amdhsa" {{.*}}"-disable-llvm-passes" {{.*}} "-o" "[[OUTPUT:.+]]" "-x" "c"
// INVOCATION: "{{.*}}llvm-link" "-o" "a.out" "[[OUTPUT]]"
// INVOCATION: "{{.*}}llvm-spirv" "--spirv-max-version=1.6" "--spirv-ext=+all" "--spirv-allow-unknown-intrinsics" "--spirv-lower-const-expr" "--spirv-preserve-auxdata" "--spirv-debug-info-version=nonsemantic-shader-200" "a.out" "-o" "a.out"
|
|
@llvm/pr-subscribers-clang-driver Author: Alex Voicu (AlexVlx) ChangesSPIR-V specific optimizations can inadvertently remove information that is important for the AMDGPU BE / break certain code patterns we rely on. Therefore, for AMDGCN flavoured SPIR-V we disable optimizations over IR, to ensure that we operate directly on the output of Clang CodeGen when we finalise. Full diff: https://github.com/llvm/llvm-project/pull/154765.diff 4 Files Affected:
diff --git a/clang/lib/Driver/ToolChains/AMDGPU.cpp b/clang/lib/Driver/ToolChains/AMDGPU.cpp
index 5e463b9c98687..bad1bd694b9e7 100644
--- a/clang/lib/Driver/ToolChains/AMDGPU.cpp
+++ b/clang/lib/Driver/ToolChains/AMDGPU.cpp
@@ -858,6 +858,13 @@ void AMDGPUToolChain::addClangTargetOptions(
CC1Args.push_back("-fapply-global-visibility-to-externs");
}
+ // For SPIR-V we want to retain the pristine output of Clang CodeGen, since
+ // optimizations might lose structure / information that is necessary for
+ // generating optimal concrete AMDGPU code.
+ if (getTriple().isSPIRV() &&
+ !DriverArgs.hasArg(options::OPT_disable_llvm_passes))
+ CC1Args.push_back("-disable-llvm-passes");
+
if (DeviceOffloadingKind == Action::OFK_None)
addOpenCLBuiltinsLib(getDriver(), DriverArgs, CC1Args);
}
diff --git a/clang/lib/Driver/ToolChains/HIPAMD.cpp b/clang/lib/Driver/ToolChains/HIPAMD.cpp
index b4c6da0d73d13..5f3fbea40f162 100644
--- a/clang/lib/Driver/ToolChains/HIPAMD.cpp
+++ b/clang/lib/Driver/ToolChains/HIPAMD.cpp
@@ -261,6 +261,12 @@ void HIPAMDToolChain::addClangTargetOptions(
// with options that match the user-supplied ones.
if (!DriverArgs.hasArg(options::OPT_fembed_bitcode_marker))
CC1Args.push_back("-fembed-bitcode=marker");
+ // For SPIR-V we want to retain the pristine output of Clang CodeGen, since
+ // optimizations might lose structure / information that is necessary for
+ // generating optimal concrete AMDGPU code. We duplicate this because the
+ // HIP TC doesn't invoke the base AMDGPU TC addClangTargetOptions.
+ if (!DriverArgs.hasArg(options::OPT_disable_llvm_passes))
+ CC1Args.push_back("-disable-llvm-passes");
return; // No DeviceLibs for SPIR-V.
}
diff --git a/clang/test/Driver/hip-toolchain-no-rdc.hip b/clang/test/Driver/hip-toolchain-no-rdc.hip
index ddd251b67cc57..dc8f0a97ad371 100644
--- a/clang/test/Driver/hip-toolchain-no-rdc.hip
+++ b/clang/test/Driver/hip-toolchain-no-rdc.hip
@@ -206,7 +206,7 @@
// Check mixed AMDGCNSPIRV and concrete GPU arch.
//
-// AMDGCNSPIRV: "-cc1" "-triple" "spirv64-amd-amdhsa" {{.*}}"-emit-llvm-bc" {{.*}} "-o" "[[AMDGCNSPV_BC:.*bc]]"
+// AMDGCNSPIRV: "-cc1" "-triple" "spirv64-amd-amdhsa" {{.*}}"-emit-llvm-bc" {{.*}}"-fembed-bitcode=marker" "-disable-llvm-passes" {{.*}} "-o" "[[AMDGCNSPV_BC:.*bc]]"
// AMDGCNSPIRV: {{".*llvm-link.*"}} "-o" "[[AMDGCNSPV_TMP:.*out]]" "[[AMDGCNSPV_BC]]"
// AMDGCNSPIRV: {{".*llvm-spirv.*"}} "--spirv-max-version=1.6" "--spirv-ext=+all" {{.*}} "[[AMDGCNSPV_TMP]]" {{.*}}"-o" "[[AMDGCNSPV_CO:.*out]]"
// AMDGCNSPIRV: "-cc1" "-triple" "amdgcn-amd-amdhsa" {{.*}}"-emit-obj" {{.*}}"-target-cpu" "gfx900"{{.*}} "-o" "[[GFX900_OBJ:.*o]]"
diff --git a/clang/test/Driver/spirv-amd-toolchain.c b/clang/test/Driver/spirv-amd-toolchain.c
index c9417400a9378..0c8ccd1d17487 100644
--- a/clang/test/Driver/spirv-amd-toolchain.c
+++ b/clang/test/Driver/spirv-amd-toolchain.c
@@ -14,6 +14,6 @@
// RUN: %clang -### --target=spirv64-amd-amdhsa %s -nogpulib -nogpuinc 2>&1 \
// RUN: | FileCheck %s --check-prefix=INVOCATION
-// INVOCATION: "-cc1" "-triple" "spirv64-amd-amdhsa" {{.*}} "-o" "[[OUTPUT:.+]]" "-x" "c"
+// INVOCATION: "-cc1" "-triple" "spirv64-amd-amdhsa" {{.*}}"-disable-llvm-passes" {{.*}} "-o" "[[OUTPUT:.+]]" "-x" "c"
// INVOCATION: "{{.*}}llvm-link" "-o" "a.out" "[[OUTPUT]]"
// INVOCATION: "{{.*}}llvm-spirv" "--spirv-max-version=1.6" "--spirv-ext=+all" "--spirv-allow-unknown-intrinsics" "--spirv-lower-const-expr" "--spirv-preserve-auxdata" "--spirv-debug-info-version=nonsemantic-shader-200" "a.out" "-o" "a.out"
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a pretty massive work-around. Could you be more specific about what the issue is? I'd expect LLVM optimizations to preserve everything we need, and if they don't it's because we're breaking the rules / overriding symbols with -mlink-builtin-bitcode.
This is not a workaround, it's a "by design" aspect we haven't gotten around to. SPIRV, even when flavoured, is not AMDGCN, so it cannot (and should not) preserve everything we need. Consider vectorisation, or AS inference, target specific types, completely different costing for SROA & friends, different optimal expansions of known patterns etc. At the same time, if we have target (AMDGPU) specific passes that are expected to run before optimisation, they must run on unoptimsed IR. As more SPIR-V specific optimisations get added / enabled, this delta will only increase. On top of this, as it is we're duplicating / doing spurious work by (unintentionally) running passes at least twice, once over pre SPV IR, and once over post SPV IR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a workaround and isn't the start of a path to a good design. This needs concrete examples, with end to end test examples.
SPIRV is a storage format and flavoring it as AMDGPU shouldn't introduce unique optimization issues. If SPIRV is adding custom optimizations that is harmful to the ultimate finalization, that's an issue to directly address in the SPIRV optimizations.
Consider vectorisation
I'd expect the SPIRV cost models to disable any nontrivial / unrecoverable vectorizations (i.e. loop vectorization)
AS inference
You should be able to run this as many times as you like, you'll just have improved information later.
different optimal expansions of known patterns
SPIRV is mostly a set of passthrough operations, the amount of legalization expansions should be small
target specific types
This sounds the most concerning and workaroundy in your list
On top of this, as it is we're duplicating / doing spurious work by (unintentionally) running passes at least twice, once over pre SPV IR, and once over post SPV IR.
This sounds fine, I'd expect the shipped SPIRV to have a base set of optimizations applied
This seems to me as the core issue. SPIRV being a target-neutral representation, how is it that target-neutral optimizations in LLVM are harmful to any one target? If hypothetically a HIP program were parsed into an identical LLVM IR, wouldn't the same optimizations make the same decisions for the AMDGCN target? |
To my knowledge, "target-neutral optimizations" are not precisely defined in LLVM (nor is target neutrality) - I could be wrong. Having said that, even if they were, at the moment SPIR-V (mostly) runs the standard optimisation pipeline associated with a particular O level, which at O3 includes a number of non-invertible non structure preserving transforms such as JumpThreading. It might be worthwhile to discuss this with the SPIR-V target maintainers, but even if that were addressed / changed, there's still appeal to us never optimising SPIR-V itself. I tried to godbolt as compact an example as I could, from a recent real-world debug excursion, please see:
I am not claiming we could not address this particular example, we quite probably could. But consider a similar circumstance except it's tens / hundreds of kernels that are hundreds (at least) of lines long that one is looking at, trying to figure if a bug found its way in SPIR-V generation / reverse-translation / AMDGPU itself, but it just happened to work. Holding the AMDGPU BE constant, as if it were directly connected to the FE, seems appealing. Especially in light of the fact that we aren't in a position to avoid re-running optimisations even if, as you say, they had made the same decisions. Furthermore, we can always flip this back in the future if it's demonstrably beneficial for us to optimise SPIR-V itself, which is not the case at the moment (coupled with the above-mentioned difficulties). |
|
For the record, I don't mean to hold up this patch with just my opinions, since clearly I am not the one spending time making SPIRV work for AMDGCN. Will defer to other people's review about the actual mechanics of the patch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://godbolt.org/z/Y6vYeYvaW
This shows my main concern. We should not be skipping the mandatory passes on this path. The SPIRV consumer should not be taking on the responsibility of handling the IR-lowered-in-frontend features (mainly always-inline and the coroutine passes).
This is a debug flag, this usecase needs a new one. There is already the similarly named "-disable-llvm-optzns" but it appears to just be an alias right now.
Not sure what loop(loop-idiom-vectorize) or globaldce are doing there at -O0, those seem like bugs
The loop-idiom-recognize appears to be an aarch64 specific bug. I would also consider the globaldce to be a bug, but I assume it's just getting added as a set with the coroutine lowering |
|
We've had some internal discussion around this, we will rely on this mechanism whilst we implement support for what we want in Clang. |
SPIR-V specific optimizations can inadvertently remove information that is important for the AMDGPU BE / break certain code patterns we rely on. Therefore, for AMDGCN flavoured SPIR-V we disable optimizations over IR, to ensure that we operate directly on the output of Clang CodeGen when we finalise.