Commit 5d09154
authored
Simplify and optimize mGPU tile intersection sort (#258)
1. Add prefetching for input and output keys and values for radix sort
to avoid page faults
2. Avoid forking threads for radix sort merging to avoid pthread
overhead
Reduces time per iteration by about 2ms
Signed-off-by: Matthew Cong <mcong@nvidia.com>1 parent 151faf3 commit 5d09154
1 file changed
+181
-177
lines changed
0 commit comments