Skip to content

Commit f4b477f

Browse files
randyh62neon60
authored andcommitted
Update hip_porting_guide.rst
Implement Leo's feedback.
1 parent 80f6881 commit f4b477f

File tree

1 file changed

+47
-49
lines changed

1 file changed

+47
-49
lines changed

docs/how-to/hip_porting_guide.rst

Lines changed: 47 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -5,20 +5,20 @@
55
.. _porting_cuda_code:
66

77
*******************************************************************************
8-
Porting CUDA code to HIP
8+
Porting NVIDIA CUDA code to HIP
99
*******************************************************************************
1010

11-
HIP is designed to ease the porting of existing CUDA code into the HIP
12-
environment to let you run your application on AMD GPUs. This page describes
13-
the available tools and provides practical suggestions on how to port your CUDA
14-
code and work through common issues.
11+
HIP eases the porting of existing NVIDIA CUDA code into the HIP
12+
environment, enabling you to run your application on AMD GPUs. This topic describes
13+
the available tools and provides practical suggestions for porting your CUDA
14+
code and working through common issues.
1515

1616
CUDA provides separate driver and runtime APIs, while HIP uses a single API.
1717
The two CUDA APIs generally provide similar functionality and are mostly interchangeable.
18-
However, the CUDA driver API enables fine-grained control over the kernel-level
18+
However, the CUDA driver API provides fine-grained control over kernel-level
1919
initialization, contexts, and module management, while the runtime API automatically
20-
manages contexts and modules. The driver API is suitable for applications that need
21-
tight integration with other systems, or require advanced control over GPU resources.
20+
manages contexts and modules. The driver API is suitable for applications that require
21+
tight integration with other systems or advanced control over GPU resources.
2222

2323
* Driver API calls begin with the prefix ``cu``, while runtime API calls begin
2424
with the prefix ``cuda``. For example, the driver API contains
@@ -38,7 +38,7 @@ Porting a CUDA project
3838
======================
3939

4040
HIP projects can target either AMD or NVIDIA platforms. HIP is a marshalling language
41-
that provides a thin-layer mapping to functions in AMD's ROCm language, or to CUDA
41+
that provides a thin-layer mapping to functions in the AMD ROCm language, or to CUDA
4242
functions. To compile the HIP code, you can use ``amdclang++``, also called HIP-Clang,
4343
or you can use ``hipcc`` to enable compilation by ``nvcc`` to produce CUDA executables,
4444
as described in :ref:`compilation_platform`.
@@ -48,7 +48,7 @@ with CUDA code results in valid application code. This enables users to incremen
4848
a CUDA project to HIP, and still compile and test the code during the transition.
4949

5050
The only notable exception is ``hipError_t``, which is not just an alias to
51-
``cudaError_t``. In these cases HIP provides functions to convert between the
51+
``cudaError_t``. In these cases, HIP provides functions to convert between the
5252
error code spaces:
5353

5454
* :cpp:func:`hipErrorToCudaError`
@@ -326,7 +326,7 @@ or ``nvcc`` (on NVIDIA systems), passing the necessary options to the target
326326
compiler. Tools that call ``hipcc`` must ensure the compiler options are appropriate
327327
for the target compiler.
328328

329-
``hipconfig`` is a helpful tool in identifying the current systems platform,
329+
``hipconfig`` is a helpful tool for identifying the current system's platform,
330330
compiler and runtime. It can also help set options appropriately. As an example,
331331
``hipconfig`` can provide a path to HIP, in Makefiles:
332332

@@ -336,7 +336,7 @@ compiler and runtime. It can also help set options appropriately. As an example,
336336
337337
.. note::
338338
You can use ``amdclang++`` to target NVIDIA systems, but you must manually specify
339-
the required options for the compiler.
339+
the required compiler options.
340340

341341
HIP Headers
342342
-----------
@@ -345,7 +345,7 @@ The ``hip_runtime.h`` headers define all the necessary types, functions, macros,
345345
etc., needed to compile a HIP program, this includes host as well as device
346346
code. ``hip_runtime_api.h`` is a subset of ``hip_runtime.h``.
347347

348-
CUDA has slightly different contents for these two files. In some cases you may
348+
CUDA has slightly different contents for these two files. In some cases you might
349349
need to convert hipified code to include the richer ``hip_runtime.h`` instead of
350350
``hip_runtime_api.h``.
351351

@@ -374,12 +374,12 @@ default headers, and instead you must explicitly include all required files.
374374
and it converts ``cuda_runtime_api.h`` to ``hip_runtime_api.h``, but it may
375375
miss nested headers or macros.
376376

377-
Compiler Defines for HIP and CUDA
377+
Compiler defines for HIP and CUDA
378378
---------------------------------
379379

380380
C++-macros can be used to write code that is specific to a platform. This
381-
section lists macros that are defined by compilers and the HIP/CUDA APIs,
382-
and what compiler/platform combinations they are defined for.
381+
section lists macros defined by compilers and the HIP/CUDA APIs,
382+
and the compiler/platform combinations that define them.
383383

384384
The following table lists the macros that can be used when compiling HIP. Most
385385
of these macros are not directly defined by the compilers, but in
@@ -464,8 +464,7 @@ The following table lists macros related to ``nvcc`` and CUDA as HIP backend.
464464
Identifying the compilation target platform
465465
-------------------------------------------
466466

467-
Despite HIP's portability, it can be necessary to tailor code to a specific
468-
platform, in order to provide platform-specific code, or aid in
467+
With HIP's portability, you might need to provide platform-specific code, or enable
469468
platform-specific performance improvements.
470469

471470
For this, the ``__HIP_PLATFORM_AMD__`` and ``__HIP_PLATFORM_NVIDIA__`` macros
@@ -498,13 +497,13 @@ To explicitly use the CUDA compilation path, use:
498497
export HIP_PLATFORM=nvidia
499498
hipcc main.cpp
500499
501-
Identifying Host or Device Compilation Pass
500+
Identifying host or device compilation pass
502501
-------------------------------------------
503502

504503
``amdclang++`` makes multiple passes over the code: one pass for the host code, and
505504
for the device code one pass for each GPU architecture to be compiled for.
506-
``nvcc`` only makes two passes over the code: one for host code and one for device
507-
code.
505+
``nvcc`` only makes two passes over the code: one for the host code and one for the
506+
device code.
508507

509508
The ``__HIP_DEVICE_COMPILE__`` macro is defined when the compiler is compiling
510509
for the device. This macro is a portable check that can replace the
@@ -564,7 +563,7 @@ Kernel launching
564563
----------------
565564

566565
HIP-Clang supports kernel launching using either the triple chevron (``<<<>>>``) syntax,
567-
:cpp:func:`hipLaunchKernel`, or :cpp:func:`hipLaunchKernelGGL`. The last option is a macro which
566+
:cpp:func:`hipLaunchKernel`, or :cpp:func:`hipLaunchKernelGGL`. The last option is a macro that
568567
expands to the ``<<<>>>`` syntax by default. It can also be turned into a template by
569568
defining ``HIP_TEMPLATE_KERNEL_LAUNCH``.
570569

@@ -585,8 +584,8 @@ kernel associated with the stub function is launched.
585584
NVCC implementation notes
586585
=========================
587586

588-
CUDA applications might want to mix CUDA code with HIP code (see the
589-
example below). This table shows the equivalence between CUDA and HIP types
587+
CUDA applications can mix CUDA code with HIP code (see the
588+
example below). The table shows the equivalent CUDA and HIP types
590589
required to implement this interaction.
591590

592591
.. list-table:: Equivalence table between HIP and CUDA types
@@ -752,7 +751,7 @@ Identifying device architecture and features
752751

753752
GPUs of different generations and architectures do not provide the same
754753
level of :doc:`hardware feature support <../reference/hardware_features>`. To
755-
guard device-code that uses architecture dependent features, the
754+
guard device code that uses architecture-dependent features, the
756755
``__HIP_ARCH_<FEATURE>__`` C++-macros can be used, as described below.
757756

758757
Device code feature identification
@@ -785,7 +784,7 @@ Host code feature identification
785784
The host code must not rely on the ``__HIP_ARCH_<FEATURE>__`` macros, because the
786785
GPUs available to a system are not known during compile time, and their
787786
architectural features differ. Alternatively, the host code can query architecture
788-
feature flags during runtime, by using :cpp:func:`hipGetDeviceProperties`
787+
feature flags during runtime by using :cpp:func:`hipGetDeviceProperties`
789788
or :cpp:func:`hipDeviceGetAttribute`.
790789

791790
.. code-block:: cpp
@@ -820,11 +819,11 @@ or :cpp:func:`hipDeviceGetAttribute`.
820819
std::cout << " shared int32 atomic operations" << std::endl;
821820
}
822821
823-
Table of feature macros and properties
824-
--------------------------------------
822+
Feature macros and properties
823+
-----------------------------
825824

826-
The table below shows the full set of architectural properties that HIP
827-
supports, together with the corresponding macros and device properties.
825+
The following table lists the feature macros that HIP supports,
826+
alongside corresponding device properties that can be queried from the host code.
828827

829828
.. list-table::
830829
:header-rows: 1
@@ -914,8 +913,8 @@ how to write portable warpSize-aware code.
914913
Porting from CUDA __launch_bounds__
915914
===================================
916915

917-
CUDA defines a ``__launch_bounds__`` qualifier which works similar to HIP's
918-
implementation, however it uses different parameters:
916+
CUDA defines a ``__launch_bounds__`` qualifier which works similarly to the HIP
917+
implementation, however, it uses different parameters:
919918

920919
.. code-block:: cpp
921920
@@ -948,27 +947,27 @@ AMD GCN architecture has 4 execution units per multiprocessor.
948947
maxregcount
949948
-----------
950949

951-
The ``nvcc`` compiler will try to guess the number of registers per thread based on the launch bounds.
950+
The ``nvcc`` compiler will predict the number of registers per thread based on the launch bounds calculation.
952951
``--maxregcount X`` can be used to override the compiler's decision by enforcing a hard number of registers
953-
(``X``) that the compiler must not exceed. If the compiler is unable meet this requirement it will place
952+
(``X``) that the compiler must not exceed. If the compiler is unable to meet this requirement, it will place
954953
additional "registers" into memory instead of using hardware registers.
955954

956-
Unlike ``nvcc``, ``amdclang++`` does not support the ``--maxregcount`` option. Users are encouraged to use
955+
Unlike ``nvcc``, ``amdclang++`` does not support the ``--maxregcount`` option. You are encouraged to use
957956
the ``__launch_bounds__`` directive since the parameters are more intuitive and portable than micro-architecture
958957
details like registers. The directive allows per-kernel control.
959958

960959
Driver entry point access
961960
=========================
962961

963962
The HIP runtime provides support for CUDA driver entry point access when using
964-
CUDA 12.0 or later. This feature allows developers to directly interact with the
963+
CUDA 12.0 or later. This feature lets developers interact directly with the
965964
CUDA driver API, providing more control over GPU operations.
966965

967966
Driver entry point access provides several features:
968967

969968
* Retrieving the address of a runtime function
970969
* Requesting the default stream version on a per-thread basis
971-
* Accessing new HIP features on older toolkits with a newer driver
970+
* Accessing HIP features on older toolkits with a newer driver
972971

973972
For more information on driver entry point access, see :cpp:func:`hipGetProcAddress`.
974973

@@ -1077,14 +1076,14 @@ efficiency.
10771076
return 0;
10781077
}
10791078
1080-
Accessing new HIP features with a newer driver
1081-
----------------------------------------------
1079+
Accessing HIP features with a newer driver
1080+
------------------------------------------
10821081

1083-
HIP is designed to be forward compatible, allowing newer features to be utilized
1082+
HIP is forward compatible, allowing newer features to be utilized
10841083
with older toolkits, provided a compatible driver is present. Feature support
10851084
can be verified through runtime API functions and version checks. This approach
10861085
ensures that applications can benefit from new features and improvements in the
1087-
HIP runtime without needing to be recompiled with a newer toolkit. The function
1086+
HIP runtime without requiring recompilation with a newer toolkit. The function
10881087
:cpp:func:`hipGetProcAddress` enables dynamic querying and the use of newer
10891088
functions offered by the HIP runtime, even if the application was built with an
10901089
older toolkit.
@@ -1115,8 +1114,8 @@ The HIP version number is defined as an integer:
11151114
CU_POINTER_ATTRIBUTE_MEMORY_TYPE
11161115
================================
11171116

1118-
To get the pointer's memory type in HIP, developers should use :cpp:func:`hipPointerGetAttributes`.
1119-
First parameter of the function is `hipPointerAttribute_t`. Its ``type`` member variable indicates
1117+
To return the pointer's memory type in HIP, developers should use :cpp:func:`hipPointerGetAttributes`.
1118+
The first parameter of the function is `hipPointerAttribute_t`. Its ``type`` member variable indicates
11201119
whether the memory pointed to is allocated on the device or the host. For example:
11211120

11221121
.. code-block:: cpp
@@ -1138,7 +1137,7 @@ whether the memory pointed to is allocated on the device or the host. For exampl
11381137
Note that ``hipMemoryType`` enum values are different from the
11391138
``cudaMemoryType`` enum values.
11401139

1141-
For example, on AMD platform, `hipMemoryType` is defined in `hip_runtime_api.h`,
1140+
For example, on AMD platform, ``hipMemoryType`` is defined in ``hip_runtime_api.h``:
11421141

11431142
.. code-block:: cpp
11441143
@@ -1150,7 +1149,7 @@ For example, on AMD platform, `hipMemoryType` is defined in `hip_runtime_api.h`,
11501149
hipMemoryTypeManaged = 4 ///< Managed memory, automaticallly managed by the unified memory system
11511150
} hipMemoryType;
11521151
1153-
Looking into CUDA toolkit, it defines `cudaMemoryType` as following,
1152+
In the CUDA toolkit, the ``cudaMemoryType`` is defined as following:
11541153

11551154
.. code-block:: cpp
11561155
@@ -1167,9 +1166,8 @@ Looking into CUDA toolkit, it defines `cudaMemoryType` as following,
11671166
due to HIP functionality backward compatibility.
11681167

11691168
The memory type translation for ``hipPointerGetAttributes`` needs to
1170-
be handled properly on NVIDIA platform to get the correct memory type in CUDA,
1169+
be handled properly on NVIDIA platform to return the correct memory type in CUDA,
11711170
which is done in the file ``nvidia_hip_runtime_api.h``.
11721171

1173-
So in any HIP applications which use HIP APIs involving memory types, developers
1174-
should use ``#ifdef`` in order to assign the correct enum values depending on
1175-
NVIDIA or AMD platform.
1172+
In applications that use HIP memory type APIs, you should use ``#ifdef``
1173+
to assign the correct enum values depending on NVIDIA or AMD platform.

0 commit comments

Comments
 (0)