Skip to content

Commit 203f061

Browse files
committed
[Clang][CodeGen] Add module flag for square root precision
A module flag is now set based on the `-f[no]-cuda-prec-sqrt` flag, allowing the NVVMReflect pass to recognize and apply the specified square root precision.
1 parent 523ad69 commit 203f061

File tree

2 files changed

+19
-0
lines changed

2 files changed

+19
-0
lines changed

clang/lib/CodeGen/CodeGenModule.cpp

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1286,6 +1286,10 @@ void CodeGenModule::Release() {
12861286
}
12871287

12881288
if (LangOpts.CUDAIsDevice && getTriple().isNVPTX()) {
1289+
// Indicate whether __nvvm_reflect should be configured to use precise
1290+
// square root. (This corresponds to its "__CUDA_PREC_SQRT" property.)
1291+
getModule().addModuleFlag(llvm::Module::Override, "nvvm-reflect-prec-sqrt",
1292+
CodeGenOpts.CudaPreciseSqrt);
12891293
// Indicate whether __nvvm_reflect should be configured to flush denormal
12901294
// floating point values to 0. (This corresponds to its "__CUDA_FTZ"
12911295
// property.)
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
// RUN: %clang_cc1 -fcuda-is-device \
2+
// RUN: -triple nvptx-nvidia-cuda -emit-llvm -o - %s | \
3+
// RUN: FileCheck -check-prefixes=NO-PREC-SQRT %s
4+
5+
// RUN: %clang_cc1 -fcuda-is-device -fcuda-prec-sqrt \
6+
// RUN: -triple nvptx-nvidia-cuda -emit-llvm -o - %s | \
7+
// RUN: FileCheck -check-prefixes=PREC-SQRT %s
8+
9+
#include "Inputs/cuda.h"
10+
11+
extern "C" __device__ void foo() {}
12+
13+
14+
// NO-PREC-SQRT: !{i32 4, !"nvvm-reflect-prec-sqrt", i32 0}
15+
// PREC-SQRT: !{i32 4, !"nvvm-reflect-prec-sqrt", i32 1}

0 commit comments

Comments
 (0)