Skip to content

Commit 89b112b

Browse files
committed
Merge branch 'main' of github.com:NVIDIA/cccl into fea/use-sccache-build-cluster
2 parents 57da012 + c1ecd0e commit 89b112b

File tree

22 files changed

+691
-187
lines changed

22 files changed

+691
-187
lines changed

cub/cub/detail/detect_cuda_runtime.cuh

Lines changed: 2 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -20,11 +20,6 @@
2020
# pragma system_header
2121
#endif // no system header
2222

23-
// CUDA headers might not be present when using NVRTC, see NVIDIA/cccl#2095 for detail
24-
#if !_CCCL_COMPILER(NVRTC)
25-
# include <cuda_runtime_api.h>
26-
#endif // !_CCCL_COMPILER(NVRTC)
27-
2823
#ifdef _CCCL_DOXYGEN_INVOKED // Only parse this during doxygen passes:
2924
//! Defined if RDC is enabled and CUB_DISABLE_CDP is not defined.
3025
//! Deprecated [Since 3.2]
@@ -40,9 +35,9 @@
4035
# define CUB_RUNTIME_FUNCTION
4136
#else // Non-doxygen pass:
4237

43-
# if _CCCL_HAS_RDC()
38+
# if _CCCL_HAS_CDP()
4439
# define CUB_RDC_ENABLED
45-
# endif // _CCCL_HAS_RDC()
40+
# endif // _CCCL_HAS_CDP()
4641

4742
# ifndef CUB_RUNTIME_FUNCTION
4843
# define CUB_RUNTIME_FUNCTION _CCCL_CDP_API

docs/cub/api_docs/device_wide.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -90,6 +90,7 @@ CUB device-level single-problem parallel algorithms:
9090
* :cpp:struct:`cub::DeviceRunLengthEncode` demarcating "runs" of same-valued items within a sequence residing within device-accessible memory
9191
* :cpp:struct:`cub::DeviceScan` computes a prefix scan across a sequence of data items residing within device-accessible memory
9292
* :cpp:struct:`cub::DeviceSelect` compacts data residing within device-accessible memory
93+
* :cpp:struct:`cub::DeviceTransform` transforms elements from multiple input sequences into an output sequence
9394
* :cpp:struct:`cub::DeviceTopK` finds the largest (or smallest) K items from an unordered list residing within device-accessible memory
9495

9596

docs/cudax/stf.rst

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2016,6 +2016,47 @@ A token corresponds to a ``logical_data<void_interface>`` object, so that the
20162016
``token`` type serves as a short-hand for this type. ``ctx.token()`` thus
20172017
returns an object with a ``token`` type.
20182018

2019+
Debugging
2020+
---------
2021+
2022+
Enabling internal checks
2023+
^^^^^^^^^^^^^^^^^^^^^^^^
2024+
2025+
CUDASTF includes internal assertions (``_CCCL_ASSERT``) that help detect
2026+
programming errors and invalid usage patterns during development. These checks
2027+
are disabled by default for performance but can be enabled to aid debugging.
2028+
2029+
**With CMake:**
2030+
2031+
When building in Debug mode, assertions are enabled automatically:
2032+
2033+
.. code:: bash
2034+
2035+
cmake -DCMAKE_BUILD_TYPE=Debug ..
2036+
2037+
To explicitly enable assertions for any build type, add the compile definition
2038+
to your target:
2039+
2040+
.. code:: cmake
2041+
2042+
target_compile_definitions(your_target PRIVATE CCCL_ENABLE_ASSERTIONS)
2043+
2044+
**With Makefile or manual compilation:**
2045+
2046+
Add the ``-DCCCL_ENABLE_ASSERTIONS`` flag to your compiler invocation:
2047+
2048+
.. code:: bash
2049+
2050+
# For nvcc
2051+
nvcc -DCCCL_ENABLE_ASSERTIONS ...
2052+
2053+
# For host compiler
2054+
g++ -DCCCL_ENABLE_ASSERTIONS ...
2055+
2056+
Note that this flag enables the assertion checks themselves. For full debugging
2057+
support (setting breakpoints, inspecting variables), you may also want to add
2058+
debug symbol flags (``-g`` for host code, ``-G`` for device code).
2059+
20192060
Tools
20202061
-----
20212062

0 commit comments

Comments
 (0)