55.. _porting_cuda_code :
66
77*******************************************************************************
8- Porting CUDA code to HIP
8+ Porting NVIDIA CUDA code to HIP
99*******************************************************************************
1010
11- HIP is designed to ease the porting of existing CUDA code into the HIP
12- environment to let you run your application on AMD GPUs. This page describes
13- the available tools and provides practical suggestions on how to port your CUDA
14- code and work through common issues.
11+ HIP eases the porting of existing NVIDIA CUDA code into the HIP
12+ environment, enabling you to run your application on AMD GPUs. This topic describes
13+ the available tools and provides practical suggestions for porting your CUDA
14+ code and working through common issues.
1515
1616CUDA provides separate driver and runtime APIs, while HIP uses a single API.
1717The two CUDA APIs generally provide similar functionality and are mostly interchangeable.
18- However, the CUDA driver API enables fine-grained control over the kernel-level
18+ However, the CUDA driver API provides fine-grained control over kernel-level
1919initialization, contexts, and module management, while the runtime API automatically
20- manages contexts and modules. The driver API is suitable for applications that need
21- tight integration with other systems, or require advanced control over GPU resources.
20+ manages contexts and modules. The driver API is suitable for applications that require
21+ tight integration with other systems or advanced control over GPU resources.
2222
2323* Driver API calls begin with the prefix ``cu ``, while runtime API calls begin
2424 with the prefix ``cuda ``. For example, the driver API contains
@@ -38,7 +38,7 @@ Porting a CUDA project
3838======================
3939
4040HIP projects can target either AMD or NVIDIA platforms. HIP is a marshalling language
41- that provides a thin-layer mapping to functions in AMD's ROCm language, or to CUDA
41+ that provides a thin-layer mapping to functions in the AMD ROCm language, or to CUDA
4242functions. To compile the HIP code, you can use ``amdclang++ ``, also called HIP-Clang,
4343or you can use ``hipcc `` to enable compilation by ``nvcc `` to produce CUDA executables,
4444as described in :ref: `compilation_platform `.
@@ -48,7 +48,7 @@ with CUDA code results in valid application code. This enables users to incremen
4848a CUDA project to HIP, and still compile and test the code during the transition.
4949
5050The only notable exception is ``hipError_t ``, which is not just an alias to
51- ``cudaError_t ``. In these cases HIP provides functions to convert between the
51+ ``cudaError_t ``. In these cases, HIP provides functions to convert between the
5252error code spaces:
5353
5454* :cpp:func: `hipErrorToCudaError `
@@ -326,7 +326,7 @@ or ``nvcc`` (on NVIDIA systems), passing the necessary options to the target
326326compiler. Tools that call ``hipcc `` must ensure the compiler options are appropriate
327327for the target compiler.
328328
329- ``hipconfig `` is a helpful tool in identifying the current systems platform,
329+ ``hipconfig `` is a helpful tool for identifying the current system's platform,
330330compiler and runtime. It can also help set options appropriately. As an example,
331331``hipconfig `` can provide a path to HIP, in Makefiles:
332332
@@ -336,7 +336,7 @@ compiler and runtime. It can also help set options appropriately. As an example,
336336
337337 .. note ::
338338 You can use ``amdclang++ `` to target NVIDIA systems, but you must manually specify
339- the required options for the compiler.
339+ the required compiler options .
340340
341341HIP Headers
342342-----------
@@ -345,7 +345,7 @@ The ``hip_runtime.h`` headers define all the necessary types, functions, macros,
345345etc., needed to compile a HIP program, this includes host as well as device
346346code. ``hip_runtime_api.h `` is a subset of ``hip_runtime.h ``.
347347
348- CUDA has slightly different contents for these two files. In some cases you may
348+ CUDA has slightly different contents for these two files. In some cases you might
349349need to convert hipified code to include the richer ``hip_runtime.h `` instead of
350350``hip_runtime_api.h ``.
351351
@@ -374,12 +374,12 @@ default headers, and instead you must explicitly include all required files.
374374 and it converts ``cuda_runtime_api.h `` to ``hip_runtime_api.h ``, but it may
375375 miss nested headers or macros.
376376
377- Compiler Defines for HIP and CUDA
377+ Compiler defines for HIP and CUDA
378378---------------------------------
379379
380380C++-macros can be used to write code that is specific to a platform. This
381- section lists macros that are defined by compilers and the HIP/CUDA APIs,
382- and what compiler/platform combinations they are defined for .
381+ section lists macros defined by compilers and the HIP/CUDA APIs,
382+ and the compiler/platform combinations that define them .
383383
384384The following table lists the macros that can be used when compiling HIP. Most
385385of these macros are not directly defined by the compilers, but in
@@ -464,8 +464,7 @@ The following table lists macros related to ``nvcc`` and CUDA as HIP backend.
464464Identifying the compilation target platform
465465-------------------------------------------
466466
467- Despite HIP's portability, it can be necessary to tailor code to a specific
468- platform, in order to provide platform-specific code, or aid in
467+ With HIP's portability, you might need to provide platform-specific code, or enable
469468platform-specific performance improvements.
470469
471470For this, the ``__HIP_PLATFORM_AMD__ `` and ``__HIP_PLATFORM_NVIDIA__ `` macros
@@ -498,13 +497,13 @@ To explicitly use the CUDA compilation path, use:
498497 export HIP_PLATFORM=nvidia
499498 hipcc main.cpp
500499
501- Identifying Host or Device Compilation Pass
500+ Identifying host or device compilation pass
502501-------------------------------------------
503502
504503``amdclang++ `` makes multiple passes over the code: one pass for the host code, and
505504for the device code one pass for each GPU architecture to be compiled for.
506- ``nvcc `` only makes two passes over the code: one for host code and one for device
507- code.
505+ ``nvcc `` only makes two passes over the code: one for the host code and one for the
506+ device code.
508507
509508The ``__HIP_DEVICE_COMPILE__ `` macro is defined when the compiler is compiling
510509for the device. This macro is a portable check that can replace the
@@ -564,7 +563,7 @@ Kernel launching
564563----------------
565564
566565HIP-Clang supports kernel launching using either the triple chevron (``<<<>>> ``) syntax,
567- :cpp:func: `hipLaunchKernel `, or :cpp:func: `hipLaunchKernelGGL `. The last option is a macro which
566+ :cpp:func: `hipLaunchKernel `, or :cpp:func: `hipLaunchKernelGGL `. The last option is a macro that
568567expands to the ``<<<>>> `` syntax by default. It can also be turned into a template by
569568defining ``HIP_TEMPLATE_KERNEL_LAUNCH ``.
570569
@@ -585,8 +584,8 @@ kernel associated with the stub function is launched.
585584NVCC implementation notes
586585=========================
587586
588- CUDA applications might want to mix CUDA code with HIP code (see the
589- example below). This table shows the equivalence between CUDA and HIP types
587+ CUDA applications can mix CUDA code with HIP code (see the
588+ example below). The table shows the equivalent CUDA and HIP types
590589required to implement this interaction.
591590
592591.. list-table :: Equivalence table between HIP and CUDA types
@@ -752,7 +751,7 @@ Identifying device architecture and features
752751
753752GPUs of different generations and architectures do not provide the same
754753level of :doc: `hardware feature support <../reference/hardware_features >`. To
755- guard device- code that uses architecture dependent features, the
754+ guard device code that uses architecture- dependent features, the
756755``__HIP_ARCH_<FEATURE>__ `` C++-macros can be used, as described below.
757756
758757Device code feature identification
@@ -785,7 +784,7 @@ Host code feature identification
785784The host code must not rely on the ``__HIP_ARCH_<FEATURE>__ `` macros, because the
786785GPUs available to a system are not known during compile time, and their
787786architectural features differ. Alternatively, the host code can query architecture
788- feature flags during runtime, by using :cpp:func: `hipGetDeviceProperties `
787+ feature flags during runtime by using :cpp:func: `hipGetDeviceProperties `
789788or :cpp:func: `hipDeviceGetAttribute `.
790789
791790.. code-block :: cpp
@@ -820,11 +819,11 @@ or :cpp:func:`hipDeviceGetAttribute`.
820819 std::cout << " shared int32 atomic operations" << std::endl;
821820 }
822821
823- Table of feature macros and properties
824- --------------------------------------
822+ Feature macros and properties
823+ -----------------------------
825824
826- The table below shows the full set of architectural properties that HIP
827- supports, together with the corresponding macros and device properties .
825+ The following table lists the feature macros that HIP supports,
826+ alongside corresponding device properties that can be queried from the host code .
828827
829828.. list-table ::
830829 :header-rows: 1
@@ -914,8 +913,8 @@ how to write portable warpSize-aware code.
914913Porting from CUDA __launch_bounds__
915914===================================
916915
917- CUDA defines a ``__launch_bounds__ `` qualifier which works similar to HIP's
918- implementation, however it uses different parameters:
916+ CUDA defines a ``__launch_bounds__ `` qualifier which works similarly to the HIP
917+ implementation, however, it uses different parameters:
919918
920919.. code-block :: cpp
921920
@@ -948,27 +947,27 @@ AMD GCN architecture has 4 execution units per multiprocessor.
948947maxregcount
949948-----------
950949
951- The ``nvcc `` compiler will try to guess the number of registers per thread based on the launch bounds.
950+ The ``nvcc `` compiler will predict the number of registers per thread based on the launch bounds calculation .
952951``--maxregcount X `` can be used to override the compiler's decision by enforcing a hard number of registers
953- (``X ``) that the compiler must not exceed. If the compiler is unable meet this requirement it will place
952+ (``X ``) that the compiler must not exceed. If the compiler is unable to meet this requirement, it will place
954953additional "registers" into memory instead of using hardware registers.
955954
956- Unlike ``nvcc ``, ``amdclang++ `` does not support the ``--maxregcount `` option. Users are encouraged to use
955+ Unlike ``nvcc ``, ``amdclang++ `` does not support the ``--maxregcount `` option. You are encouraged to use
957956the ``__launch_bounds__ `` directive since the parameters are more intuitive and portable than micro-architecture
958957details like registers. The directive allows per-kernel control.
959958
960959Driver entry point access
961960=========================
962961
963962The HIP runtime provides support for CUDA driver entry point access when using
964- CUDA 12.0 or later. This feature allows developers to directly interact with the
963+ CUDA 12.0 or later. This feature lets developers interact directly with the
965964CUDA driver API, providing more control over GPU operations.
966965
967966Driver entry point access provides several features:
968967
969968* Retrieving the address of a runtime function
970969* Requesting the default stream version on a per-thread basis
971- * Accessing new HIP features on older toolkits with a newer driver
970+ * Accessing HIP features on older toolkits with a newer driver
972971
973972For more information on driver entry point access, see :cpp:func: `hipGetProcAddress `.
974973
@@ -1077,14 +1076,14 @@ efficiency.
10771076 return 0;
10781077 }
10791078
1080- Accessing new HIP features with a newer driver
1081- ----------------------------------------------
1079+ Accessing HIP features with a newer driver
1080+ ------------------------------------------
10821081
1083- HIP is designed to be forward compatible, allowing newer features to be utilized
1082+ HIP is forward compatible, allowing newer features to be utilized
10841083with older toolkits, provided a compatible driver is present. Feature support
10851084can be verified through runtime API functions and version checks. This approach
10861085ensures that applications can benefit from new features and improvements in the
1087- HIP runtime without needing to be recompiled with a newer toolkit. The function
1086+ HIP runtime without requiring recompilation with a newer toolkit. The function
10881087:cpp:func: `hipGetProcAddress ` enables dynamic querying and the use of newer
10891088functions offered by the HIP runtime, even if the application was built with an
10901089older toolkit.
@@ -1115,8 +1114,8 @@ The HIP version number is defined as an integer:
11151114 CU_POINTER_ATTRIBUTE_MEMORY_TYPE
11161115================================
11171116
1118- To get the pointer's memory type in HIP, developers should use :cpp:func: `hipPointerGetAttributes `.
1119- First parameter of the function is `hipPointerAttribute_t `. Its ``type `` member variable indicates
1117+ To return the pointer's memory type in HIP, developers should use :cpp:func: `hipPointerGetAttributes `.
1118+ The first parameter of the function is `hipPointerAttribute_t `. Its ``type `` member variable indicates
11201119whether the memory pointed to is allocated on the device or the host. For example:
11211120
11221121.. code-block :: cpp
@@ -1138,7 +1137,7 @@ whether the memory pointed to is allocated on the device or the host. For exampl
11381137 Note that ``hipMemoryType `` enum values are different from the
11391138``cudaMemoryType `` enum values.
11401139
1141- For example, on AMD platform, `hipMemoryType ` is defined in `hip_runtime_api.h `,
1140+ For example, on AMD platform, `` hipMemoryType `` is defined in `` hip_runtime_api.h ``:
11421141
11431142.. code-block :: cpp
11441143
@@ -1150,7 +1149,7 @@ For example, on AMD platform, `hipMemoryType` is defined in `hip_runtime_api.h`,
11501149 hipMemoryTypeManaged = 4 ///< Managed memory, automaticallly managed by the unified memory system
11511150 } hipMemoryType;
11521151
1153- Looking into CUDA toolkit, it defines ` cudaMemoryType ` as following,
1152+ In the CUDA toolkit, the `` cudaMemoryType `` is defined as following:
11541153
11551154.. code-block :: cpp
11561155
@@ -1167,9 +1166,8 @@ Looking into CUDA toolkit, it defines `cudaMemoryType` as following,
11671166 due to HIP functionality backward compatibility.
11681167
11691168The memory type translation for ``hipPointerGetAttributes `` needs to
1170- be handled properly on NVIDIA platform to get the correct memory type in CUDA,
1169+ be handled properly on NVIDIA platform to return the correct memory type in CUDA,
11711170which is done in the file ``nvidia_hip_runtime_api.h ``.
11721171
1173- So in any HIP applications which use HIP APIs involving memory types, developers
1174- should use ``#ifdef `` in order to assign the correct enum values depending on
1175- NVIDIA or AMD platform.
1172+ In applications that use HIP memory type APIs, you should use ``#ifdef ``
1173+ to assign the correct enum values depending on NVIDIA or AMD platform.
0 commit comments