Conversation
|
Note: verifying SASS is still TODO, will get to it shortly. |
|
After the above policy realignment, there's no more SASS differences. |
|
Note, turns out there's still more work on the policy values here. |
This comment has been minimized.
This comment has been minimized.
|
I think that fixes the policies, currently still no SASS diffs. |
🥳 CI Workflow Results🟩 Finished in 1h 55m: Pass: 100%/249 | Total: 8d 06h | Max: 1h 32m | Hits: 72%/160202See results here. |
| @@ -119,7 +125,37 @@ template <typename ChainedPolicyT, | |||
| typename OffsetT, | |||
| typename AccumT, | |||
| typename KeyT = cub::detail::it_value_t<KeysInputIteratorT>> | |||
There was a problem hiding this comment.
Important: those parameters are unused, let's remove them.
| OffsetT, | ||
| AccumT, | ||
| KeyT>(), | ||
| 1) |
There was a problem hiding this comment.
Important: This 1 was not there before and could impact SASS. Maybe stick to the old arguments and remove it.
| template <typename PolicyGetter, typename PolicySelectorT> | ||
| CUB_RUNTIME_FUNCTION _CCCL_FORCEINLINE cudaError_t __invoke(PolicyGetter policy_getter, const PolicySelectorT&) |
There was a problem hiding this comment.
Important: I think PolicySelectorT is unused, so it can be removed again.
| template <typename PolicyGetter, typename PolicySelectorT> | |
| CUB_RUNTIME_FUNCTION _CCCL_FORCEINLINE cudaError_t __invoke(PolicyGetter policy_getter, const PolicySelectorT&) | |
| template <typename PolicyGetter> | |
| CUB_RUNTIME_FUNCTION _CCCL_FORCEINLINE cudaError_t __invoke(PolicyGetter policy_getter) |
| #include <nvbench_helper.cuh> | ||
|
|
||
| #include "../policy_selector.h" | ||
|
|
There was a problem hiding this comment.
I don't think the policy selector there handles scan by key, since it returns a scan_policy and not a scan_by_key_policy. I think we need a dedicated policy selector for this benchmark here.
| if (primitive_op_t == primitive_op::yes) | ||
| { | ||
| switch (key_size) | ||
| { | ||
| case 1: | ||
| switch (value_size) | ||
| { | ||
| case 1: | ||
| if (primitive_value_t == primitive_accum::yes) | ||
| { |
There was a problem hiding this comment.
Suggestion: can we simplify this? Maybe it would be more readable if we did if/else clauses like:
| if (primitive_op_t == primitive_op::yes) | |
| { | |
| switch (key_size) | |
| { | |
| case 1: | |
| switch (value_size) | |
| { | |
| case 1: | |
| if (primitive_value_t == primitive_accum::yes) | |
| { | |
| const bool prim_op = primitive_op_t == primitive_op::yes; | |
| const bool prim_val = primitive_value_t == primitive_accum::yes; | |
| if (prim_op && key_size == 1 && value_size == 1 && prim_val) { | |
| ... | |
| } | |
| else if (prim_op && key_size == 1 && value_size == 2 && prim_val) { | |
| ... | |
| } | |
| ... |
| } | ||
| } | ||
|
|
||
| arch = ::cuda::arch_id::sm_80; |
There was a problem hiding this comment.
Important: I think it's very confusing if we change the arch argument during this already very long function. Let's try to avoid that.
| template <typename PolicyGetter, typename PolicySelectorT> | ||
| CUB_RUNTIME_FUNCTION _CCCL_HOST _CCCL_FORCEINLINE cudaError_t | ||
| invoke(PolicyGetter policy_getter, const PolicySelectorT& policy_selector) | ||
| { | ||
| return __invoke(policy_getter, policy_selector); | ||
| } |
There was a problem hiding this comment.
Q: Why do we need this function? Can't we just call __invoke directly?
| "Dispatching DeviceScanByKey to arch %d with tuning: %s\n", static_cast<int>(arch_id), ss.str().c_str());)) | ||
| #endif | ||
|
|
||
| return detail::dispatch_arch(policy_selector, arch_id, [&](auto policy_getter) { |
There was a problem hiding this comment.
Suggestion: IIUC, the active tuning policy is never needed as a compile-time value, so we can just omit using dispatch_arch and just query the policy like and keep it a runtime value:
const scan_by_key_policy active_policy = policy_selector(arch_id);This would also reduce template instantiations.
Description
Implements the new, improved, tuning API for ScanByKey.
Resolves #7640.
Checklist