Expose PSTL algorithms through <cuda/std/algorithm> and <cuda/std/numeric>#7931
Expose PSTL algorithms through <cuda/std/algorithm> and <cuda/std/numeric>#7931miscco wants to merge 3 commits intoNVIDIA:mainfrom
<cuda/std/algorithm> and <cuda/std/numeric>#7931Conversation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
libcudacxx/test/libcudacxx/std/algorithms/alg.modifying/alg.copy/pstl_copy_n.cu
Show resolved
Hide resolved
This comment has been minimized.
This comment has been minimized.
| // parallel algorithms | ||
| #if _CCCL_HAS_PSTL_BACKEND() | ||
| # include <cuda/std/__pstl/adjacent_find.h> | ||
| # include <cuda/std/__pstl/all_of.h> | ||
| # include <cuda/std/__pstl/any_of.h> | ||
| # include <cuda/std/__pstl/copy.h> | ||
| # include <cuda/std/__pstl/copy_if.h> | ||
| # include <cuda/std/__pstl/copy_n.h> |
There was a problem hiding this comment.
Q: I thought many standard libraries would expose PSTL algorithms through the <execution> header and not <algorithm>. This would make the inclusion of <algorithm> cheaper.
There was a problem hiding this comment.
Discussed this with @miscco offline and it seems the C++ standard requires the overloads to be in <algorithm>. However, it may not be observable to the common user, since they need to include <execution> in addition to supply an execution policy.
There was a problem hiding this comment.
If it's not observable, then I would like to see exposing it in the <execution> header to avoid bloating <algorithm>.
There was a problem hiding this comment.
I do not believe that is a correct statement.
<execution> can include it all and be fine, but then <algorithm> would not have it.
The point is that the pstl headers pull effectively all of <algorithm>
There was a problem hiding this comment.
can include it all and be fine, but then would not have it.
Why is the advantage of <algorithm> having an overload that cannot be called if a user does not also include <execution>?
The point is that the pstl headers pull effectively all of
This is fine IMO, including a PSTL header can be more expensive.
There was a problem hiding this comment.
Moved the exposure to <cuda/std/execution> in the hope of then being able to expose via a modularized access
This comment has been minimized.
This comment has been minimized.
|
@miscco could you please measure the compile-time of #include <cuda/std/algorithm>
int main() {
return cuda::std::min(0, 2);
}before and after this PR? I would be curious how much of an impact pulling in most of CUB has ;) |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
I checked the differences in compile times for the Main:
With PSTL
The reason that |
Thank you for providing these numbers! I think we should go ahead with the status quo and expose the PSTL through |
🥳 CI Workflow Results🟩 Finished in 3h 40m: Pass: 100%/105 | Total: 2d 19h | Max: 3h 40m | Hits: 80%/258702See results here. |
|
We discussed this today in the our internal review meeting and decided to better align the customization with the other environment work |
We discussed this internally and we are happy with the results of the parallel CUDA backend. So we want to expose this now rather than waiting for all algorithms to be implemented.
There are certain caveats:
We require random access iterators for the CUDA backend
We do not expose only a CUDA backend through
cuda::execution::gpu. Standard execution policies will currently static_assert that there is a missing backendWe do not provide any fallback serial implementation. This would be dangerous, because the serial implementation would naively run on host and not device.