Skip to content
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 45 additions & 17 deletions clang/lib/Driver/ToolChains/SYCL.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,12 @@
#include "clang/Driver/DriverDiagnostic.h"
#include "clang/Driver/InputInfo.h"
#include "clang/Driver/Options.h"
#include "llvm/ADT/SmallSet.h"
#include "llvm/Option/Option.h"
#include "llvm/SYCLLowerIR/DeviceConfigFile.hpp"
#include "llvm/Support/CommandLine.h"
#include "llvm/Support/FileSystem.h"
#include "llvm/Support/Path.h"
#include "llvm/SYCLLowerIR/DeviceConfigFile.hpp"
#include <algorithm>
#include <sstream>

Expand Down Expand Up @@ -299,6 +300,10 @@ bool SYCL::shouldDoPerObjectFileLinking(const Compilation &C) {
// Return whether to use native bfloat16 library.
static bool selectBfloatLibs(const llvm::Triple &Triple, const Compilation &C,
bool &UseNative) {

static llvm::SmallSet<StringRef, 8> GPUArchsWithNBF16{
"intel_gpu_pvc", "intel_gpu_acm_g10", "intel_gpu_acm_g11",
"intel_gpu_acm_g12", "intel_gpu_bmg_g21"};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine as a short-term fix, but longer-term we should be thinking of a different way to identify architectures with bfloat16 support that doesn't require maintaining a list of devices. For example, is there a way to query the properties of the target device(s) to determine if bfloat16 is supported?

Note that Lunar Lake GPUs also support bfloat16 (I think), and it's not in this list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @bashbaug
The bf16 support can be queried in execution time because we can only know the real target the program is running during execution time. In compilation time for AOT mode, compiler driver can only decide according to target platform specified and we have to maintain the bf16 support information in compiler driver source code, otherwise compiler driver won't know whether a target platform supports bf16. The platform we are building the program may be different from the platform the program is going to run on.
Thanks very much.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we call into ocloc to do the query? For example, if we're AOT compiling for DG2, we could do something like (this example uses the ocloc command-line interface, but the same is supported for the library interface):

$ ocloc query CL_DEVICE_EXTENSIONS_WITH_VERSION -device dg2
<snip> cl_intel_bfloat16_conversions:1.0.0 <snip>

Because the cl_intel_bfloat16_conversions extension is supported, we can know that DG2 supports SPIR-V bfloat16 conversion instructions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @bashbaug
I just tried the ocloc command but found after running "ocloc query CL_DEVICE_EXTENSIONS_WITH_VERSION -device dg2" but found a file named with "CL_DEVICE_EXTENSIONS_WITH_VERSION" is generated in local directory, it seems to be ocloc's behavior, is there any way to get rid of this?
Thanks very much.

const llvm::opt::ArgList &Args = C.getArgs();
bool NeedLibs = false;

Expand All @@ -324,32 +329,54 @@ static bool selectBfloatLibs(const llvm::Triple &Triple, const Compilation &C,
}
}

UseNative = false;

// Check for intel_gpu_pvc as the target
if (Arg *SYCLTarget = Args.getLastArg(options::OPT_fsycl_targets_EQ)) {
if (SYCLTarget->getValues().size() == 1) {
StringRef SYCLTargetStr = SYCLTarget->getValue();
if (SYCLTargetStr == "intel_gpu_pvc")
UseNative = true;
}
}

auto checkBF = [](StringRef Device) {
return Device.starts_with("pvc") || Device.starts_with("ats");
};
// We need to select fallback/native bfloat16 devicelib in AOT compilation
// targetting for Intel GPU devices. Users have 2 ways to apply AOT,
// 1). clang++ -fsycl -fsycl-targets=spir64_gen -Xs "-device pvc,...,"
// 2). clang++ -fsycl -fsycl-targets=intel_gpu_pvc,...
// We assume that users will only apply either 1) or 2) and won't mix the
// 2 ways in their compiling command.
Copy link
Contributor

@srividya-sundaram srividya-sundaram Jan 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the case where we can specify Intel GPUs via --offload-arch option?

clang++ --offload-new-driver -fsycl --offload-arch=bdw

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @srividya-sundaram
Thanks for pointing out this, this PR didn't consider new offload driver, I will update it.
Thanks very much.


std::string Params;
for (const auto &Arg : TargArgs) {
Params += " ";
Params += Arg;
}

auto checkBF = [](StringRef Device) {
return Device.starts_with("pvc") || Device.starts_with("ats") ||
Device.starts_with("dg2") || Device.starts_with("bmg");
};

auto checkSpirvJIT = [](StringRef Target) {
return Target.starts_with("spir64-") || Target.starts_with("spirv64-") ||
(Target == "spir64") || (Target == "spirv64");
};

size_t DevicesPos = Params.find("-device ");
if (!UseNative && DevicesPos != std::string::npos) {
// "-device xxx" is used to specify AOT target device.
if (DevicesPos != std::string::npos) {
UseNative = true;
std::istringstream Devices(Params.substr(DevicesPos + 8));
for (std::string S; std::getline(Devices, S, ',');)
UseNative &= checkBF(S);
return NeedLibs;
} else {
// -fsycl-targets=intel_gpu_xxx is used to specify AOT target device.
// Multiple Intel GPU devices can be specified, native bfloat16 devicelib
// can be involved only when all GPU deivces specified support native
// bfloat16 native conversion.
UseNative = true;

if (Arg *SYCLTarget = Args.getLastArg(options::OPT_fsycl_targets_EQ)) {
for (auto TargetsV : SYCLTarget->getValues()) {
if (!checkSpirvJIT(StringRef(TargetsV)) &&
!GPUArchsWithNBF16.contains(StringRef(TargetsV))) {
UseNative = false;
break;
}
}
}
return NeedLibs;
}
}
return NeedLibs;
Expand Down Expand Up @@ -505,7 +532,8 @@ SYCL::getDeviceLibraries(const Compilation &C, const llvm::Triple &TargetTriple,
}

if (TargetTriple.isNVPTX() && IgnoreSingleLibs)
LibraryList.push_back(Args.MakeArgString("devicelib-nvptx64-nvidia-cuda.bc"));
LibraryList.push_back(
Args.MakeArgString("devicelib-nvptx64-nvidia-cuda.bc"));

if (TargetTriple.isAMDGCN() && IgnoreSingleLibs)
LibraryList.push_back(Args.MakeArgString("devicelib-amdgcn-amd-amdhsa.bc"));
Expand Down
90 changes: 90 additions & 0 deletions clang/test/Driver/sycl-device-lib-bfloat16.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,79 @@
// RUN: --sysroot=%S/Inputs/SYCL -### 2>&1 \
// RUN: | FileCheck %s -check-prefix=BFLOAT16-FALLBACK-FALLBACK


// Test test AOT-DG2 compilation uses native libs + native libs.
// RUN: %clangxx -fsycl -fsycl-targets=intel_gpu_acm_g10 \
// RUN: --sysroot=%S/Inputs/SYCL %s -### 2>&1 \
// RUN: | FileCheck %s -check-prefix=BFLOAT16-NATIVE

// Test test AOT-PVC + AOT-DG2 compilation uses native libs + native libs.
// RUN: %clangxx -fsycl -fsycl-targets=intel_gpu_pvc,intel_gpu_acm_g10 \
// RUN: --sysroot=%S/Inputs/SYCL %s -### 2>&1 \
// RUN: | FileCheck %s -check-prefix=BFLOAT16-NATIVE-NATIVE

// Test test AOT-PVC + AOT-DG1 compilation uses native libs + native libs.
// RUN: %clangxx -fsycl -fsycl-targets=intel_gpu_dg1,intel_gpu_acm_g10 \
// RUN: --sysroot=%S/Inputs/SYCL %s -### 2>&1 \
// RUN: | FileCheck %s -check-prefix=BFLOAT16-FALLBACK-FALLBACK


// Test test AOT-PVC + JIT compilation uses native libs + no libs
// RUN: %clangxx -fsycl -fsycl-targets=intel_gpu_pvc,spir64 \
// RUN: --sysroot=%S/Inputs/SYCL %s -### 2>&1 \
// RUN: | FileCheck %s -check-prefix=BFLOAT16-NATIVE-NONE
// RUN: %clangxx -fsycl -fsycl-targets=intel_gpu_pvc,spirv64 \
// RUN: --sysroot=%S/Inputs/SYCL %s -### 2>&1 \
// RUN: | FileCheck %s -check-prefix=BFLOAT16-NATIVE-NONE
// RUN: %clangxx -fsycl -fsycl-targets=intel_gpu_pvc,spir64-unknown-unknown \
// RUN: --sysroot=%S/Inputs/SYCL %s -### 2>&1 \
// RUN: | FileCheck %s -check-prefix=BFLOAT16-NATIVE-NONE
// RUN: %clangxx -fsycl -fsycl-targets=intel_gpu_pvc,spirv64-unknown-unknown \
// RUN: --sysroot=%S/Inputs/SYCL %s -### 2>&1 \
// RUN: | FileCheck %s -check-prefix=BFLOAT16-NATIVE-NONE

// Test test AOT-DG1 + JIT compilation uses native libs + no libs
// RUN: %clangxx -fsycl -fsycl-targets=intel_gpu_dg1,spir64 \
// RUN: --sysroot=%S/Inputs/SYCL %s -### 2>&1 \
// RUN: | FileCheck %s -check-prefix=BFLOAT16-FALLBACK-NONE
// RUN: %clangxx -fsycl -fsycl-targets=intel_gpu_dg1,spirv64 \
// RUN: --sysroot=%S/Inputs/SYCL %s -### 2>&1 \
// RUN: | FileCheck %s -check-prefix=BFLOAT16-FALLBACK-NONE
// RUN: %clangxx -fsycl -fsycl-targets=intel_gpu_dg1,spir64-unknown-unknown \
// RUN: --sysroot=%S/Inputs/SYCL %s -### 2>&1 \
// RUN: | FileCheck %s -check-prefix=BFLOAT16-FALLBACK-NONE
// RUN: %clangxx -fsycl -fsycl-targets=intel_gpu_dg1,spirv64-unknown-unknown \
// RUN: --sysroot=%S/Inputs/SYCL %s -### 2>&1 \
// RUN: | FileCheck %s -check-prefix=BFLOAT16-FALLBACK-NONE

// Test test AOT-PVC + JIT compilation + AOT-DG2 uses native libs + no libs + native libs
// RUN: %clangxx -fsycl -fsycl-targets=intel_gpu_pvc,spir64,intel_gpu_acm_g10 \
// RUN: --sysroot=%S/Inputs/SYCL %s -### 2>&1 \
// RUN: | FileCheck %s -check-prefix=BFLOAT16-NATIVE-NONE-NATIVE
// RUN: %clangxx -fsycl -fsycl-targets=intel_gpu_pvc,spirv64,intel_gpu_acm_g10 \
// RUN: --sysroot=%S/Inputs/SYCL %s -### 2>&1 \
// RUN: | FileCheck %s -check-prefix=BFLOAT16-NATIVE-NONE-NATIVE
// RUN: %clangxx -fsycl -fsycl-targets=intel_gpu_pvc,spir64-unknown-unknown,intel_gpu_acm_g10 \
// RUN: --sysroot=%S/Inputs/SYCL %s -### 2>&1 \
// RUN: | FileCheck %s -check-prefix=BFLOAT16-NATIVE-NONE-NATIVE
// RUN: %clangxx -fsycl -fsycl-targets=intel_gpu_pvc,spirv64-unknown-unknown,intel_gpu_acm_g10 \
// RUN: --sysroot=%S/Inputs/SYCL %s -### 2>&1 \
// RUN: | FileCheck %s -check-prefix=BFLOAT16-NATIVE-NONE-NATIVE

// Test test AOT-PVC + JIT compilation + AOT-DG1 uses fallback libs + no libs + fallback libs
// RUN: %clangxx -fsycl -fsycl-targets=intel_gpu_pvc,spir64,intel_gpu_dg1 \
// RUN: --sysroot=%S/Inputs/SYCL %s -### 2>&1 \
// RUN: | FileCheck %s -check-prefix=BFLOAT16-FALLBACK-NONE-FALLBACK
// RUN: %clangxx -fsycl -fsycl-targets=intel_gpu_pvc,spirv64,intel_gpu_dg1 \
// RUN: --sysroot=%S/Inputs/SYCL %s -### 2>&1 \
// RUN: | FileCheck %s -check-prefix=BFLOAT16-FALLBACK-NONE-FALLBACK
// RUN: %clangxx -fsycl -fsycl-targets=intel_gpu_pvc,spir64-unknown-unknown,intel_gpu_dg1 \
// RUN: --sysroot=%S/Inputs/SYCL %s -### 2>&1 \
// RUN: | FileCheck %s -check-prefix=BFLOAT16-FALLBACK-NONE-FALLBACK
// RUN: %clangxx -fsycl -fsycl-targets=intel_gpu_pvc,spirv64-unknown-unknown,intel_gpu_dg1 \
// RUN: --sysroot=%S/Inputs/SYCL %s -### 2>&1 \
// RUN: | FileCheck %s -check-prefix=BFLOAT16-FALLBACK-NONE-FALLBACK

// BFLOAT16-NOT: llvm-link{{.*}} "{{.*}}libsycl-{{fallback|native}}-bfloat16.bc"

// BFLOAT16-NATIVE: llvm-link{{.*}} "{{.*}}libsycl-native-bfloat16.bc"
Expand All @@ -85,3 +158,20 @@

// BFLOAT16-FALLBACK-FALLBACK: llvm-link{{.*}} "{{.*}}libsycl-fallback-bfloat16.bc"
// BFLOAT16-FALLBACK-FALLBACK: "{{.*}}libsycl-fallback-bfloat16.bc"

// BFLOAT16-NATIVE-NATIVE: llvm-link{{.*}} "{{.*}}libsycl-native-bfloat16.bc"
// BFLOAT16-NATIVE-NATIVE: llvm-link{{.*}} "{{.*}}libsycl-native-bfloat16.bc"

// BFLOAT16-NATIVE-NONE: llvm-link{{.*}} "{{.*}}libsycl-native-bfloat16.bc"
// BFLOAT16-NATIVE-NONE-NOT: llvm-link{{.*}} "{{.*}}-bfloat16.bc"

// BFLOAT16-FALLBACK-NONE: llvm-link{{.*}} "{{.*}}libsycl-fallback-bfloat16.bc"
// BFLOAT16-FALLBACK-NONE-NOT: llvm-link{{.*}} "{{.*}}-bfloat16.bc"

// BFLOAT16-NATIVE-NONE-NATIVE: llvm-link{{.*}} "{{.*}}libsycl-native-bfloat16.bc"
// BFLOAT16-NATIVE-NONE-NATIVE-NOT: llvm-link{{.*}} "{{.*}}-bfloat16.bc"
// BFLOAT16-NATIVE-NONE-NATIVE: llvm-link{{.*}} "{{.*}}libsycl-native-bfloat16.bc"

// BFLOAT16-FALLBACK-NONE-FALLBACK: llvm-link{{.*}} "{{.*}}libsycl-fallback-bfloat16.bc"
// BFLOAT16-FALLBACK-NONE-FALLBACK-NOT: llvm-link{{.*}} "{{.*}}-bfloat16.bc"
// BFLOAT16-FALLBACK-NONE-FALLBACK: llvm-link{{.*}} "{{.*}}libsycl-fallback-bfloat16.bc"
Loading