Skip to content
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions clang/lib/CodeGen/BackendUtil.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1093,8 +1093,9 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
ThinOrFullLTOPhase) {
MPM.addPass(createModuleToFunctionPassAdaptor(
InferAddressSpacesPass(clang::targets::SPIR_GENERIC_AS)));
MPM.addPass(
createModuleToFunctionPassAdaptor(SYCLOptimizeBarriersPass()));
if (Level != OptimizationLevel::O0)
MPM.addPass(createModuleToFunctionPassAdaptor(
SYCLOptimizeBarriersPass()));
});
}

Expand Down
68 changes: 68 additions & 0 deletions sycl/test/check_device_code/narrow-barrier-explicit-spirv-call.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
// RUN: %clangxx -fsycl-device-only -fsycl-unnamed-lambda -S -Xclang -emit-llvm -Xclang -no-enable-noundef-analysis -O2 %s -o - | FileCheck %s --check-prefix=CHECK-O2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm not really happy with the test.
The problem I see here is that the tests that reside in sycl/test/check_device_code/ are intended to check that SYCL library is using front-end provided hooks/attributes correctly, this test is explicitly using SPIR-V builtin which should not happen in these tests.
If we simply want to check that the builtin is treated properly by the optimizations, it is better to add a simpler smaller test to clang/CodeGenSYCL. This way the test will also be much faster, since inclusion of the whole sycl.hpp won't be needed. If we want to see that SYCL application as user would write it generates the right code, we can leave the test here but we should not use the builtin explicitly.

Copy link
Contributor Author

@MrSidims MrSidims Sep 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strongly disagree, the test shouldn't be placed in clang/CodeGenSYCL . There should be frontend tests only, they must not test optimizations happening later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test is explicitly using SPIR-V builtin which should not happen in these tests

using sycl::group_builtin won't expose missing O0 check in the backend utils, as the SYCL API is using several wrapper functions to get scope, memory semantics etc before calling spirv builtin.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strongly disagree, the test shouldn't be placed in clang/CodeGenSYCL . There should be frontend tests only, they must not test optimizations happening later.

The patch makes sure that clang doesn't run a pass with O0. I don't think a test for that patch actually checks the optimization's behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't know, that there is a test for the pipeline check, moved the check there, thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strongly disagree, the test shouldn't be placed in clang/CodeGenSYCL . There should be frontend tests only, they must not test optimizations happening later.

Never heard of this limitation before. I see 200 tests in clang/test/CodeGen use -O2. Most of these tests check LLVM IR after optimizations, not the LLVM IR emitted by the front-end.

// RUN: %clangxx -fsycl-device-only -fsycl-unnamed-lambda -S -Xclang -emit-llvm -Xclang -no-enable-noundef-analysis -O0 %s -o - | FileCheck %s --check-prefix=CHECK-O0

// The test checks if SYCLOptimizeBarriers pass can perform barrier scope
// narrowing in case if there are no fenced global accesses, where a barrier
// is an explicit SPIR-V friendly call.

// CHECK-O2: call spir_func void @_Z22__spirv_ControlBarrieriii(i32{{.*}}2, i32{{.*}}2, i32{{.*}}912)
// CHECK-O2: call spir_func void @_Z22__spirv_ControlBarrieriii(i32{{.*}}2, i32{{.*}}2, i32{{.*}}400)

// CHECK-O0: call spir_func void @_Z22__spirv_ControlBarrieriii(i32{{.*}}2, i32{{.*}}2, i32{{.*}}912)
// CHECK-O0: call spir_func void @_Z22__spirv_ControlBarrieriii(i32{{.*}}2, i32{{.*}}2, i32{{.*}}912)

#include <sycl/sycl.hpp>

constexpr size_t WORK_GROUP_SIZE = 1024;
constexpr size_t NUMBER_OF_WORK_GROUPS = 64;
constexpr size_t NUMBER_OF_ITERATIONS = 100;

struct GroupBarrierKernel {

GroupBarrierKernel(sycl::handler &h, float *sum)
: sum(sum), local(WORK_GROUP_SIZE, h) {}

void operator()(sycl::nd_item<1> it) const {

const size_t item_id = it.get_local_id()[0];
const size_t item_range = it.get_local_range()[0];
const size_t group_id = it.get_group().get_group_id()[0];

for (int i = 0; i < item_id; i += item_range) {
local[i] = i;
}

__spirv_ControlBarrier(/*WG scope*/ 2, /*WG scope*/ 2,
/*global + local*/ 912);
for (int offset = 1; offset < item_range; offset *= 2) {
local[item_id] += local[item_id + offset];
__spirv_ControlBarrier(/*WG scope*/ 2, /*WG scope*/ 2,
/*global + local*/ 912);
}

if (it.get_group().leader()) {
__spirv_ControlBarrier(/*WG scope*/ 2, /*WG scope*/ 2,
/*global + local*/ 912);
sum[group_id] = local[0];
}
}

float *sum;
sycl::local_accessor<float> local;
};

int main(int argc, char *argv[]) {
sycl::queue q{sycl::property::queue::enable_profiling{}};
float *sum = sycl::malloc_shared<float>(NUMBER_OF_WORK_GROUPS, q);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has to be freed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the test is removed


double modern_ns = 0;
for (int r = 0; r < NUMBER_OF_ITERATIONS + 1; ++r) {
sycl::event e = q.submit([&](sycl::handler &h) {
auto k = GroupBarrierKernel(h, sum);
h.parallel_for(sycl::nd_range<1>{NUMBER_OF_WORK_GROUPS * WORK_GROUP_SIZE,
WORK_GROUP_SIZE},
k);
});
e.wait();
}
}