[SYCL][Driver] Support bfloat16 devicelib selection when multiple AOT targets specified #16494

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

againull merged 15 commits into intel:sycl from jinge90:link_native_bf16_aot

Jan 31, 2025

Contributor

jinge90 commented Dec 31, 2024 •

edited

Loading

User can specify multiple AOT targets when building sycl program in followings ways:
1). via -fsycl-targets=intel_gpu_pvc,intel_gpu_acm_g10,....
2). via -fsycl-targets=spir64_gen ... -Xs "-device pvc,dg2...."
3). via -fsycl-targets=spir64_gen..., -Xsycl-target-backend=spir64_gen "-device pvc"
We should select native bfloat16 devicelib when all AOT targets specified support native bfloat16 conversion. Currently, pvc, dg2, bmg devices support native bfloat16.
If user specifies JIT target together with AOT targets which all support native bfloat16 conversion, we still select native bfloat16 devicelib since bfloat16 devicelib is skipped in linking step for JIT target.

jinge90 added 2 commits

December 31, 2024 14:59


          [SYCL] Select Native Bfloat16 devicelib in AOT

27512df

Signed-off-by: jinge90 <[email protected]>


          Merge remote-tracking branch 'upstream/sycl' into link_native_bf16_aot

becaba8

jinge90 requested a review from a team as a code owner

December 31, 2024 13:24

jinge90 temporarily deployed to WindowsCILock

December 31, 2024 13:24

— with

GitHub Actions Inactive


          [SYCL][Driver] Select Native bfloat16 when all AOT targets specified …

223f5e0

…support native bfloat16 conversion

jinge90 temporarily deployed to WindowsCILock

December 31, 2024 14:04

— with

GitHub Actions Inactive

jinge90 requested review from bashbaug and mdtoguchi

January 2, 2025 08:47

mdtoguchi reviewed

View reviewed changes

clang/test/Driver/sycl-device-lib-bfloat16.cpp Outdated Show resolved Hide resolved

clang/test/Driver/sycl-device-lib-bfloat16.cpp Outdated Show resolved Hide resolved

clang/test/Driver/sycl-device-lib-bfloat16.cpp Outdated Show resolved Hide resolved

clang/test/Driver/sycl-device-lib-bfloat16.cpp Outdated Show resolved Hide resolved

clang/lib/Driver/ToolChains/SYCL.cpp Outdated Show resolved Hide resolved

srividya-sundaram reviewed

View reviewed changes

clang/lib/Driver/ToolChains/SYCL.cpp Outdated

    
                  // 1). clang++ -fsycl -fsycl-targets=spir64_gen -Xs "-device pvc,...,"

                  // 2). clang++ -fsycl -fsycl-targets=intel_gpu_pvc,...

                  // We assume that users will only apply either 1) or 2) and won't mix the

                  // 2 ways in their compiling command.

Contributor

srividya-sundaram Jan 2, 2025 •

edited

Loading

What about the case where we can specify Intel GPUs via --offload-arch option?

clang++ --offload-new-driver -fsycl --offload-arch=bdw

Contributor Author

jinge90 Jan 3, 2025

Hi, @srividya-sundaram
Thanks for pointing out this, this PR didn't consider new offload driver, I will update it.
Thanks very much.

srividya-sundaram reviewed

View reviewed changes

clang/lib/Driver/ToolChains/SYCL.cpp Show resolved Hide resolved

Contributor

srividya-sundaram commented Jan 2, 2025

Please add some description to the PR in addition to the PR title.

jinge90 and others added 4 commits

January 3, 2025 13:39


          Update clang/test/Driver/sycl-device-lib-bfloat16.cpp

21fb49e

Co-authored-by: Michael Toguchi <[email protected]>


          Update clang/test/Driver/sycl-device-lib-bfloat16.cpp

b1fe3b3

Co-authored-by: Michael Toguchi <[email protected]>


          Update clang/test/Driver/sycl-device-lib-bfloat16.cpp

3c8b6d7

Co-authored-by: Michael Toguchi <[email protected]>


          Update clang/test/Driver/sycl-device-lib-bfloat16.cpp

d6dd9e2

Co-authored-by: Michael Toguchi <[email protected]>

jinge90 temporarily deployed to WindowsCILock

January 3, 2025 05:40

— with

GitHub Actions Inactive

jinge90 temporarily deployed to WindowsCILock

January 3, 2025 06:20

— with

GitHub Actions Inactive


          Support mix use of '-device xxx' and fsycl-targets=

45ae7e4

Signed-off-by: jinge90 <[email protected]>

jinge90 had a problem deploying to WindowsCILock

January 11, 2025 16:21

— with

GitHub Actions Error


          Merge remote-tracking branch 'upstream/sycl' into link_native_bf16_aot

2cfd4ab

jinge90 temporarily deployed to WindowsCILock

January 11, 2025 16:30

— with

GitHub Actions Inactive

jinge90 temporarily deployed to WindowsCILock

January 11, 2025 17:07

— with

GitHub Actions Inactive

bashbaug approved these changes

View reviewed changes

clang/lib/Driver/ToolChains/SYCL.cpp Outdated

Comment on lines 304 to 306

    
                static llvm::SmallSet<StringRef, 8> GPUArchsWithNBF16{

                    "intel_gpu_pvc", "intel_gpu_acm_g10", "intel_gpu_acm_g11",

                    "intel_gpu_acm_g12", "intel_gpu_bmg_g21"};

Contributor

bashbaug Jan 15, 2025

This is fine as a short-term fix, but longer-term we should be thinking of a different way to identify architectures with bfloat16 support that doesn't require maintaining a list of devices. For example, is there a way to query the properties of the target device(s) to determine if bfloat16 is supported?

Note that Lunar Lake GPUs also support bfloat16 (I think), and it's not in this list.

Contributor Author

jinge90 Jan 17, 2025

Hi, @bashbaug
The bf16 support can be queried in execution time because we can only know the real target the program is running during execution time. In compilation time for AOT mode, compiler driver can only decide according to target platform specified and we have to maintain the bf16 support information in compiler driver source code, otherwise compiler driver won't know whether a target platform supports bf16. The platform we are building the program may be different from the platform the program is going to run on.
Thanks very much.

Contributor

bashbaug Jan 22, 2025

Could we call into ocloc to do the query? For example, if we're AOT compiling for DG2, we could do something like (this example uses the ocloc command-line interface, but the same is supported for the library interface):

$ ocloc query CL_DEVICE_EXTENSIONS_WITH_VERSION -device dg2
<snip> cl_intel_bfloat16_conversions:1.0.0 <snip>

Because the cl_intel_bfloat16_conversions extension is supported, we can know that DG2 supports SPIR-V bfloat16 conversion instructions.

Contributor Author

jinge90 Jan 22, 2025

Hi, @bashbaug
I just tried the ocloc command but found after running "ocloc query CL_DEVICE_EXTENSIONS_WITH_VERSION -device dg2" but found a file named with "CL_DEVICE_EXTENSIONS_WITH_VERSION" is generated in local directory, it seems to be ocloc's behavior, is there any way to get rid of this?
Thanks very much.

mdtoguchi reviewed

View reviewed changes

clang/lib/Driver/ToolChains/SYCL.cpp Outdated Show resolved Hide resolved

clang/lib/Driver/ToolChains/SYCL.cpp Show resolved Hide resolved

clang/test/Driver/sycl-device-lib-bfloat16.cpp Show resolved Hide resolved

jinge90 and others added 2 commits

January 17, 2025 15:59


          Merge remote-tracking branch 'upstream/sycl' into link_native_bf16_aot

34e4819


          Update clang/lib/Driver/ToolChains/SYCL.cpp

b3b2a38

Co-authored-by: Michael Toguchi <[email protected]>

jinge90 had a problem deploying to WindowsCILock

January 17, 2025 08:07

— with

GitHub Actions Error

jinge90 temporarily deployed to WindowsCILock

January 17, 2025 08:12

— with

GitHub Actions Inactive


          Merge remote-tracking branch 'origin/link_native_bf16_aot' into link_…

55f0e7c

…native_bf16_aot

jinge90 temporarily deployed to WindowsCILock

January 17, 2025 08:48

— with

GitHub Actions Inactive

jinge90 temporarily deployed to WindowsCILock

January 20, 2025 01:54

— with

GitHub Actions Inactive


          Merge remote-tracking branch 'upstream/sycl' into link_native_bf16_aot

64ba41b

jinge90 had a problem deploying to WindowsCILock

January 20, 2025 02:31

— with

GitHub Actions Error

jinge90 temporarily deployed to WindowsCILock

January 20, 2025 02:59

— with

GitHub Actions Inactive


          add test for -Xsycl-target-backend

23dc1de

Signed-off-by: jinge90 <[email protected]>

jinge90 temporarily deployed to WindowsCILock

January 20, 2025 03:36

— with

GitHub Actions Inactive

jinge90 requested a review from mdtoguchi

January 20, 2025 09:10

jinge90 temporarily deployed to WindowsCILock

January 21, 2025 07:54

— with

GitHub Actions Inactive

Contributor Author

jinge90 commented Jan 21, 2025

Hi, @mdtoguchi
I am a little confused about the intel_gpu_* targets string, we can use:
clang++ -fsycl -fsycl-targets=intel_gpu_acm_g10 xxx to specify aot compilation for DG2 platform but we also have:
clang++ -fsycl -fsycl-targets=intel_gpu_dg2_g10 xxx for DG2 target. Is there duplicate here? Or is there any difference between intel_gpu_acm_* and intel_gpu_dg2_*

Thanks very much.


          add intel_gpu_dg2_*

c049ddd

Signed-off-by: jinge90 <[email protected]>

jinge90 temporarily deployed to WindowsCILock

January 21, 2025 08:35

— with

GitHub Actions Inactive

Contributor

mdtoguchi commented Jan 27, 2025

Hi, @mdtoguchi I am a little confused about the intel_gpu_* targets string, we can use: clang++ -fsycl -fsycl-targets=intel_gpu_acm_g10 xxx to specify aot compilation for DG2 platform but we also have: clang++ -fsycl -fsycl-targets=intel_gpu_dg2_g10 xxx for DG2 target. Is there duplicate here? Or is there any difference between intel_gpu_acm_* and intel_gpu_dg2_*

Thanks very much.

For intel_gpu_acm* and intel_gpu_dg2*, these are equivalent and map to using -device acm* for the ocloc call.

mdtoguchi approved these changes

View reviewed changes

againull merged commit 71ca51f into intel:sycl

17 checks passed

rolandschulz mentioned this pull request

New flash attention rearrangement intel/sycl-tla#210

Merged

mehdi-goli mentioned this pull request

Improve type cast for Flash Attention intel/sycl-tla#203

Merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet