-
Notifications
You must be signed in to change notification settings - Fork 66
bitonic_sort #940
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
bitonic_sort #940
Conversation
[CI]: Can one of the admins verify this patch? |
NBL_REF_ARG(value_t) loVal, NBL_REF_ARG(value_t) hiVal) | ||
{ | ||
comparator_t comp; | ||
const bool shouldSwap = ascending ? comp(hiKey, loKey) : comp(loKey, hiKey); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The compiler is probably dumb and might not realize the right term is the negation of the left term. Ternaries in SPIR-V usually get compiled to an OpSelect
which treats both terms after the ?
not as branches to conditionally execute, but as operands whose result must be evaluated before the select operation runs. That is to say, if the compiler is stupid you're going to run two comparisons. If you make the right term the negation of the left one, CSE is likely to kick in and evaluate the comparison only once.
const uint32_t invocationID = glsl::gl_SubgroupInvocationID(); | ||
const uint32_t subgroupSizeLog2 = glsl::gl_SubgroupSizeLog2(); | ||
[unroll] | ||
for (uint32_t stage = 0; stage <= subgroupSizeLog2; stage++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't add indentation after compiler directives
}; | ||
template<bool Ascending, typename Config, class device_capabilities = void> | ||
struct bitonic_sort; | ||
template<bool Ascending, typename KeyType, typename ValueType, typename Comparator, class device_capabilities> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get that Ascending
is used because when moving onto workgroup you're going to need to call alternating subgroup sorts. However, as a front-facing API if I wanted a single subgroup shuffle I'd usually want it in the order specified by the Comparator
. Maybe push it after the Config
and give it a default value of true
. Or better yet, since Ascending
can be confusing, consider calling it ReverseOrder
or something simpler that conveys the intent better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ascending
and later names like takeLarger
implicitly assume the comparator is less
(lo
and hi
don't, those are related to the "lane" order in the bitonic sort diagram). That's fine on its own, it makes the code more readable vs naming them with a more generic option. However, there should be comments mentioning that names assume this implicitly so there's no confusion.
if (takeLarger) | ||
{ | ||
if (comp(loKey, pLoKey)) { loKey = pLoKey; loVal = pLoVal; } | ||
if (comp(hiKey, pHiKey)) { hiKey = pHiKey; hiVal = pHiVal; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure this isn't reversed? Assume a less
comparator, bitonicAscending = true
for the current stage and upperHalf = true
for the current thread. Then takeLarger
semantically conveys that this thread wants to keep the larger values. And yet this code assigns the smaller values.
else | ||
{ | ||
if (comp(pLoKey, loKey)) { loKey = pLoKey; loVal = pLoVal; } | ||
if (comp(pHiKey, hiKey)) { hiKey = pHiKey; hiVal = pHiVal; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, if the compiler is dumb this code is very costly: half your threads in a subgroup will have upperHalf = true
and the other half will have it set to false
. Parallel code execution needs to be uniform across threads in the same SM, so this section of code will run twice: first some half of your threads (say, those in the upper half) will run, then the other half. This kills your throughput.
Inside each branch, the inner if
s will likely get compiled down to two OpSelects
each. You can make this whole code branchless by doing
loKey = loCondition ? loKey : pLoKey;
loVal = loCondition ? loVal : pLoVal;
hiKey = hiCondition ? hiKey : pHiKey;
hiVal = hiCondition ? hiVal : pHiVal;
where loCondition
and hiCondition
are predicates that depend on both takeLarger
and the result of the key comparison
Description
Testing
TODO list: