-
Notifications
You must be signed in to change notification settings - Fork 791
[SYCL] Don't run SYCLOptimizeBarriersPass with O0 #20037
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 3 commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
b78605b
[SYCL] Don't run SYCLOptimizeBarriersPass with O0
MrSidims 96a7a7a
format
MrSidims 1aeedf7
add test
MrSidims 68a84f0
adjust test
MrSidims 73266d4
move the test
MrSidims 71bdb3f
adjust test
MrSidims 6347c9c
Merge remote-tracking branch 'origin/sycl' into run-barriers-with-o0
MrSidims 3496abb
shorted test
MrSidims File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
65 changes: 65 additions & 0 deletions
65
sycl/test/check_device_code/narrow-barrier-explicit-spirv-call.cpp
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
// RUN: %clangxx -fsycl-device-only -fsycl-unnamed-lambda -S -Xclang -emit-llvm -Xclang -no-enable-noundef-analysis -O2 %s -o - | FileCheck %s --check-prefix=CHECK-O2 | ||
// RUN: %clangxx -fsycl-device-only -fsycl-unnamed-lambda -S -Xclang -emit-llvm -Xclang -no-enable-noundef-analysis -O0 %s -o - | FileCheck %s --check-prefix=CHECK-O0 | ||
|
||
// The test checks if SYCLOptimizeBarriers pass can perform barrier scope | ||
// narrowing in case if there are no fenced global accesses, where a barrier | ||
// is an explicit SPIR-V friendly call. | ||
|
||
// CHECK-O2: call spir_func void @_Z22__spirv_ControlBarrieriii(i32{{.*}}2, i32{{.*}}2, i32{{.*}}912) | ||
// CHECK-O2: call spir_func void @_Z22__spirv_ControlBarrieriii(i32{{.*}}2, i32{{.*}}2, i32{{.*}}400) | ||
|
||
// CHECK-O0: call spir_func void @_Z22__spirv_ControlBarrieriii(i32{{.*}}2, i32{{.*}}2, i32{{.*}}912) | ||
// CHECK-O0: call spir_func void @_Z22__spirv_ControlBarrieriii(i32{{.*}}2, i32{{.*}}2, i32{{.*}}912) | ||
|
||
#include <sycl/sycl.hpp> | ||
|
||
constexpr size_t WORK_GROUP_SIZE = 1024; | ||
constexpr size_t NUMBER_OF_WORK_GROUPS = 64; | ||
constexpr size_t NUMBER_OF_ITERATIONS = 100; | ||
|
||
struct GroupBarrierKernel { | ||
|
||
GroupBarrierKernel(sycl::handler &h, float *sum) | ||
: sum(sum), local(WORK_GROUP_SIZE, h) {} | ||
|
||
void operator()(sycl::nd_item<1> it) const { | ||
|
||
const size_t item_id = it.get_local_id()[0]; | ||
MrSidims marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
const size_t item_range = it.get_local_range()[0]; | ||
const size_t group_id = it.get_group().get_group_id()[0]; | ||
|
||
for (int i = 0; i < item_id; i += item_range) { | ||
MrSidims marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
local[i] = i; | ||
} | ||
|
||
sycl::group_barrier(it.get_group()); | ||
for (int offset = 1; offset < item_range; offset *= 2) { | ||
local[item_id] += local[item_id + offset]; | ||
MrSidims marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
__spirv_ControlBarrier(2, 2, 912); | ||
MrSidims marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
} | ||
|
||
if (it.get_group().leader()) { | ||
__spirv_ControlBarrier(2, 2, 912); | ||
sum[group_id] = local[0]; | ||
} | ||
} | ||
|
||
float *sum; | ||
sycl::local_accessor<float> local; | ||
}; | ||
|
||
int main(int argc, char *argv[]) { | ||
sycl::queue q{sycl::property::queue::enable_profiling{}}; | ||
MrSidims marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
float *sum = sycl::malloc_shared<float>(NUMBER_OF_WORK_GROUPS, q); | ||
|
||
|
||
double modern_ns = 0; | ||
MrSidims marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
for (int r = 0; r < NUMBER_OF_ITERATIONS + 1; ++r) { | ||
sycl::event e = q.submit([&](sycl::handler &h) { | ||
auto k = GroupBarrierKernel(h, sum); | ||
h.parallel_for(sycl::nd_range<1>{NUMBER_OF_WORK_GROUPS * WORK_GROUP_SIZE, | ||
WORK_GROUP_SIZE}, | ||
k); | ||
}); | ||
e.wait(); | ||
} | ||
} |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I'm not really happy with the test.
The problem I see here is that the tests that reside in sycl/test/check_device_code/ are intended to check that SYCL library is using front-end provided hooks/attributes correctly, this test is explicitly using SPIR-V builtin which should not happen in these tests.
If we simply want to check that the builtin is treated properly by the optimizations, it is better to add a simpler smaller test to clang/CodeGenSYCL. This way the test will also be much faster, since inclusion of the whole sycl.hpp won't be needed. If we want to see that SYCL application as user would write it generates the right code, we can leave the test here but we should not use the builtin explicitly.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strongly disagree, the test shouldn't be placed in clang/CodeGenSYCL . There should be frontend tests only, they must not test optimizations happening later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using sycl::group_builtin won't expose missing O0 check in the backend utils, as the SYCL API is using several wrapper functions to get scope, memory semantics etc before calling spirv builtin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The patch makes sure that clang doesn't run a pass with O0. I don't think a test for that patch actually checks the optimization's behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't know, that there is a test for the pipeline check, moved the check there, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never heard of this limitation before. I see 200 tests in clang/test/CodeGen use
-O2
. Most of these tests check LLVM IR after optimizations, not the LLVM IR emitted by the front-end.