Skip to content

Conversation

CrabExtra
Copy link

Description

Testing

TODO list:

@devshgraphicsprogrammingjenkins
Copy link
Contributor

[CI]: Can one of the admins verify this patch?

NBL_REF_ARG(value_t) loVal, NBL_REF_ARG(value_t) hiVal)
{
comparator_t comp;
const bool shouldSwap = ascending ? comp(hiKey, loKey) : comp(loKey, hiKey);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compiler is probably dumb and might not realize the right term is the negation of the left term. Ternaries in SPIR-V usually get compiled to an OpSelect which treats both terms after the ? not as branches to conditionally execute, but as operands whose result must be evaluated before the select operation runs. That is to say, if the compiler is stupid you're going to run two comparisons. If you make the right term the negation of the left one, CSE is likely to kick in and evaluate the comparison only once.

const uint32_t invocationID = glsl::gl_SubgroupInvocationID();
const uint32_t subgroupSizeLog2 = glsl::gl_SubgroupSizeLog2();
[unroll]
for (uint32_t stage = 0; stage <= subgroupSizeLog2; stage++)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't add indentation after compiler directives

};
template<bool Ascending, typename Config, class device_capabilities = void>
struct bitonic_sort;
template<bool Ascending, typename KeyType, typename ValueType, typename Comparator, class device_capabilities>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get that Ascending is used because when moving onto workgroup you're going to need to call alternating subgroup sorts. However, as a front-facing API if I wanted a single subgroup shuffle I'd usually want it in the order specified by the Comparator. Maybe push it after the Config and give it a default value of true. Or better yet, since Ascending can be confusing, consider calling it ReverseOrder or something simpler that conveys the intent better

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ascending and later names like takeLarger implicitly assume the comparator is less (lo and hi don't, those are related to the "lane" order in the bitonic sort diagram). That's fine on its own, it makes the code more readable vs naming them with a more generic option. However, there should be comments mentioning that names assume this implicitly so there's no confusion.

if (takeLarger)
{
if (comp(loKey, pLoKey)) { loKey = pLoKey; loVal = pLoVal; }
if (comp(hiKey, pHiKey)) { hiKey = pHiKey; hiVal = pHiVal; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure this isn't reversed? Assume a less comparator, bitonicAscending = true for the current stage and upperHalf = true for the current thread. Then takeLarger semantically conveys that this thread wants to keep the larger values. And yet this code assigns the smaller values.

else
{
if (comp(pLoKey, loKey)) { loKey = pLoKey; loVal = pLoVal; }
if (comp(pHiKey, hiKey)) { hiKey = pHiKey; hiVal = pHiVal; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, if the compiler is dumb this code is very costly: half your threads in a subgroup will have upperHalf = true and the other half will have it set to false. Parallel code execution needs to be uniform across threads in the same SM, so this section of code will run twice: first some half of your threads (say, those in the upper half) will run, then the other half. This kills your throughput.

Inside each branch, the inner ifs will likely get compiled down to two OpSelects each. You can make this whole code branchless by doing

loKey = loCondition ? loKey : pLoKey;
loVal = loCondition ? loVal : pLoVal;
hiKey = hiCondition ? hiKey : pHiKey;
hiVal = hiCondition ? hiVal : pHiVal;

where loCondition and hiCondition are predicates that depend on both takeLarger and the result of the key comparison

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants