Skip to content

Conversation

@jinge90
Copy link
Contributor

@jinge90 jinge90 commented Dec 31, 2024

User can specify multiple AOT targets when building sycl program in followings ways:
1). via -fsycl-targets=intel_gpu_pvc,intel_gpu_acm_g10,....
2). via -fsycl-targets=spir64_gen ... -Xs "-device pvc,dg2...."
3). via -fsycl-targets=spir64_gen..., -Xsycl-target-backend=spir64_gen "-device pvc"
We should select native bfloat16 devicelib when all AOT targets specified support native bfloat16 conversion. Currently, pvc, dg2, bmg devices support native bfloat16.
If user specifies JIT target together with AOT targets which all support native bfloat16 conversion, we still select native bfloat16 devicelib since bfloat16 devicelib is skipped in linking step for JIT target.

@jinge90 jinge90 requested a review from a team as a code owner December 31, 2024 13:24
// 1). clang++ -fsycl -fsycl-targets=spir64_gen -Xs "-device pvc,...,"
// 2). clang++ -fsycl -fsycl-targets=intel_gpu_pvc,...
// We assume that users will only apply either 1) or 2) and won't mix the
// 2 ways in their compiling command.
Copy link
Contributor

@srividya-sundaram srividya-sundaram Jan 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the case where we can specify Intel GPUs via --offload-arch option?

clang++ --offload-new-driver -fsycl --offload-arch=bdw

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @srividya-sundaram
Thanks for pointing out this, this PR didn't consider new offload driver, I will update it.
Thanks very much.

@srividya-sundaram
Copy link
Contributor

Please add some description to the PR in addition to the PR title.

Comment on lines 304 to 306
static llvm::SmallSet<StringRef, 8> GPUArchsWithNBF16{
"intel_gpu_pvc", "intel_gpu_acm_g10", "intel_gpu_acm_g11",
"intel_gpu_acm_g12", "intel_gpu_bmg_g21"};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine as a short-term fix, but longer-term we should be thinking of a different way to identify architectures with bfloat16 support that doesn't require maintaining a list of devices. For example, is there a way to query the properties of the target device(s) to determine if bfloat16 is supported?

Note that Lunar Lake GPUs also support bfloat16 (I think), and it's not in this list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @bashbaug
The bf16 support can be queried in execution time because we can only know the real target the program is running during execution time. In compilation time for AOT mode, compiler driver can only decide according to target platform specified and we have to maintain the bf16 support information in compiler driver source code, otherwise compiler driver won't know whether a target platform supports bf16. The platform we are building the program may be different from the platform the program is going to run on.
Thanks very much.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we call into ocloc to do the query? For example, if we're AOT compiling for DG2, we could do something like (this example uses the ocloc command-line interface, but the same is supported for the library interface):

$ ocloc query CL_DEVICE_EXTENSIONS_WITH_VERSION -device dg2
<snip> cl_intel_bfloat16_conversions:1.0.0 <snip>

Because the cl_intel_bfloat16_conversions extension is supported, we can know that DG2 supports SPIR-V bfloat16 conversion instructions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @bashbaug
I just tried the ocloc command but found after running "ocloc query CL_DEVICE_EXTENSIONS_WITH_VERSION -device dg2" but found a file named with "CL_DEVICE_EXTENSIONS_WITH_VERSION" is generated in local directory, it seems to be ocloc's behavior, is there any way to get rid of this?
Thanks very much.

@jinge90
Copy link
Contributor Author

jinge90 commented Jan 21, 2025

Hi, @mdtoguchi
I am a little confused about the intel_gpu_* targets string, we can use:
clang++ -fsycl -fsycl-targets=intel_gpu_acm_g10 xxx to specify aot compilation for DG2 platform but we also have:
clang++ -fsycl -fsycl-targets=intel_gpu_dg2_g10 xxx for DG2 target. Is there duplicate here? Or is there any difference between intel_gpu_acm_* and intel_gpu_dg2_*

Thanks very much.

Signed-off-by: jinge90 <[email protected]>
@mdtoguchi
Copy link
Contributor

Hi, @mdtoguchi I am a little confused about the intel_gpu_* targets string, we can use: clang++ -fsycl -fsycl-targets=intel_gpu_acm_g10 xxx to specify aot compilation for DG2 platform but we also have: clang++ -fsycl -fsycl-targets=intel_gpu_dg2_g10 xxx for DG2 target. Is there duplicate here? Or is there any difference between intel_gpu_acm_* and intel_gpu_dg2_*

Thanks very much.

For intel_gpu_acm* and intel_gpu_dg2*, these are equivalent and map to using -device acm* for the ocloc call.

@againull againull merged commit 71ca51f into intel:sycl Jan 31, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants