NVIDIA
diff --git a/‎cub/cub/detail/detect_cuda_runtime.cuh‎
Lines changed: 2 additions & 7 deletions b/‎cub/cub/detail/detect_cuda_runtime.cuh‎
Lines changed: 2 additions & 7 deletions
diff --git a/‎docs/cub/api_docs/device_wide.rst‎
Lines changed: 1 addition & 0 deletions b/‎docs/cub/api_docs/device_wide.rst‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/cudax/stf.rst‎
Lines changed: 41 additions & 0 deletions b/‎docs/cudax/stf.rst‎
Lines changed: 41 additions & 0 deletions
@@ -20,11 +20,6 @@
 #  pragma system_header
 #endif // no system header
 
-// CUDA headers might not be present when using NVRTC, see NVIDIA/cccl#2095 for detail
-#if !_CCCL_COMPILER(NVRTC)
-#  include <cuda_runtime_api.h>
-#endif // !_CCCL_COMPILER(NVRTC)
-
 #ifdef _CCCL_DOXYGEN_INVOKED // Only parse this during doxygen passes:
 //! Defined if RDC is enabled and CUB_DISABLE_CDP is not defined.
 //! Deprecated [Since 3.2]
@@ -40,9 +35,9 @@
 #  define CUB_RUNTIME_FUNCTION
 #else // Non-doxygen pass:
 
-#  if _CCCL_HAS_RDC()
+#  if _CCCL_HAS_CDP()
 #    define CUB_RDC_ENABLED
-#  endif // _CCCL_HAS_RDC()
+#  endif // _CCCL_HAS_CDP()
 
 #  ifndef CUB_RUNTIME_FUNCTION
 #    define CUB_RUNTIME_FUNCTION _CCCL_CDP_API
 
@@ -90,6 +90,7 @@ CUB device-level single-problem parallel algorithms:
 * :cpp:struct:`cub::DeviceRunLengthEncode` demarcating "runs" of same-valued items within a sequence residing within device-accessible memory
 * :cpp:struct:`cub::DeviceScan` computes a prefix scan across a sequence of data items residing within device-accessible memory
 * :cpp:struct:`cub::DeviceSelect` compacts data residing within device-accessible memory
+* :cpp:struct:`cub::DeviceTransform` transforms elements from multiple input sequences into an output sequence
 * :cpp:struct:`cub::DeviceTopK` finds the largest (or smallest) K items from an unordered list residing within device-accessible memory
 
 
 
@@ -2016,6 +2016,47 @@ A token corresponds to a ``logical_data<void_interface>`` object, so that the
 ``token`` type serves as a short-hand for this type. ``ctx.token()`` thus
 returns an object with a ``token`` type.
 
+Debugging
+---------
+
+Enabling internal checks
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+CUDASTF includes internal assertions (``_CCCL_ASSERT``) that help detect
+programming errors and invalid usage patterns during development. These checks
+are disabled by default for performance but can be enabled to aid debugging.
+
+**With CMake:**
+
+When building in Debug mode, assertions are enabled automatically:
+
+.. code:: bash
+
+   cmake -DCMAKE_BUILD_TYPE=Debug ..
+
+To explicitly enable assertions for any build type, add the compile definition
+to your target:
+
+.. code:: cmake
+
+   target_compile_definitions(your_target PRIVATE CCCL_ENABLE_ASSERTIONS)
+
+**With Makefile or manual compilation:**
+
+Add the ``-DCCCL_ENABLE_ASSERTIONS`` flag to your compiler invocation:
+
+.. code:: bash
+
+   # For nvcc
+   nvcc -DCCCL_ENABLE_ASSERTIONS ...
+
+   # For host compiler
+   g++ -DCCCL_ENABLE_ASSERTIONS ...
+
+Note that this flag enables the assertion checks themselves. For full debugging
+support (setting breakpoints, inspecting variables), you may also want to add
+debug symbol flags (``-g`` for host code, ``-G`` for device code).
+
 Tools
 -----