Skip to content

Commit 64ec294

Browse files
pkwasnie-inteligcbot
authored andcommitted
swap order of implicit kernel arguments for better alignment
Both global_id_offset and enqueued_local_size are <3 x i32> implicit kernel arguments always added when kernel uses global_id. Both take 12 bytes, but if following argument requires 8-byte alignment, an additional 4 bytes of padding might be added. If both global_id_offset and enqueued_local_size are reordered to be next to each other, they will take 24 bytes, which offers better alignment for following arguments.
1 parent 81b4de2 commit 64ec294

File tree

2 files changed

+6
-6
lines changed

2 files changed

+6
-6
lines changed

IGC/Compiler/Optimizer/OpenCLPasses/KernelArgs/KernelArgs.cpp

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -810,6 +810,7 @@ KernelArgsOrder::KernelArgsOrder(InputType layout)
810810
KernelArg::ArgType::RUNTIME_VALUE,
811811
KernelArg::ArgType::IMPLICIT_PAYLOAD_HEADER,
812812
KernelArg::ArgType::IMPLICIT_GLOBAL_OFFSET,
813+
KernelArg::ArgType::IMPLICIT_ENQUEUED_LOCAL_WORK_SIZE,
813814

814815
KernelArg::ArgType::PTR_LOCAL,
815816
KernelArg::ArgType::PTR_GLOBAL,
@@ -831,7 +832,6 @@ KernelArgsOrder::KernelArgsOrder(InputType layout)
831832
KernelArg::ArgType::IMPLICIT_LOCAL_SIZE,
832833
KernelArg::ArgType::IMPLICIT_STAGE_IN_GRID_ORIGIN,
833834
KernelArg::ArgType::IMPLICIT_STAGE_IN_GRID_SIZE,
834-
KernelArg::ArgType::IMPLICIT_ENQUEUED_LOCAL_WORK_SIZE,
835835
KernelArg::ArgType::IMPLICIT_BINDLESS_OFFSET,
836836

837837
KernelArg::ArgType::IMPLICIT_IMAGE_HEIGHT,
@@ -937,6 +937,7 @@ KernelArgsOrder::KernelArgsOrder(InputType layout)
937937
KernelArg::ArgType::RUNTIME_VALUE,
938938
KernelArg::ArgType::IMPLICIT_PAYLOAD_HEADER,
939939
KernelArg::ArgType::IMPLICIT_GLOBAL_OFFSET,
940+
KernelArg::ArgType::IMPLICIT_ENQUEUED_LOCAL_WORK_SIZE,
940941
KernelArg::ArgType::PTR_LOCAL,
941942
KernelArg::ArgType::PTR_GLOBAL,
942943
KernelArg::ArgType::PTR_CONSTANT,
@@ -956,7 +957,6 @@ KernelArgsOrder::KernelArgsOrder(InputType layout)
956957
KernelArg::ArgType::IMPLICIT_LOCAL_SIZE,
957958
KernelArg::ArgType::IMPLICIT_STAGE_IN_GRID_ORIGIN,
958959
KernelArg::ArgType::IMPLICIT_STAGE_IN_GRID_SIZE,
959-
KernelArg::ArgType::IMPLICIT_ENQUEUED_LOCAL_WORK_SIZE,
960960
KernelArg::ArgType::IMPLICIT_BINDLESS_OFFSET,
961961

962962
KernelArg::ArgType::IMPLICIT_ARG_BUFFER,

IGC/ocloc_tests/features/constant_buffer/noinline.cl

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -19,16 +19,16 @@ SPDX-License-Identifier: MIT
1919
// CHECK-SUBROUTINE-NEXT: - arg_type: global_id_offset
2020
// CHECK-SUBROUTINE-NEXT: offset: 0
2121
// CHECK-SUBROUTINE-NEXT: size: 12
22+
// CHECK-SUBROUTINE-NEXT: - arg_type: enqueued_local_size
23+
// CHECK-SUBROUTINE-NEXT: offset: 12
24+
// CHECK-SUBROUTINE-NEXT: size: 12
2225
// CHECK-SUBROUTINE-NEXT: - arg_type: arg_bypointer
23-
// CHECK-SUBROUTINE-NEXT: offset: 16
26+
// CHECK-SUBROUTINE-NEXT: offset: 24
2427
// CHECK-SUBROUTINE-NEXT: size: 8
2528
// CHECK-SUBROUTINE-NEXT: arg_index: 0
2629
// CHECK-SUBROUTINE-NEXT: addrmode: stateless
2730
// CHECK-SUBROUTINE-NEXT: addrspace: global
2831
// CHECK-SUBROUTINE-NEXT: access_type: readwrite
29-
// CHECK-SUBROUTINE-NEXT: - arg_type: enqueued_local_size
30-
// CHECK-SUBROUTINE-NEXT: offset: 24
31-
// CHECK-SUBROUTINE-NEXT: size: 12
3232
// CHECK-SUBROUTINE-NEXT: per_thread_payload_arguments:
3333
// CHECK-SUBROUTINE-NEXT: - arg_type: local_id
3434
// CHECK-SUBROUTINE-NEXT: offset: 0

0 commit comments

Comments
 (0)