| Term | CUDA | HIP | HC | C++AMP | OpenCL |
|---|---|---|---|---|---|
| Device | int deviceId |
int deviceId |
hc::accelerator |
concurrency::accelerator |
cl_device |
| Queue | cudaStream_t |
hipStream_t |
hc::accelerator_view |
concurrency::accelerator_view |
cl_command_queue |
| Event | cudaEvent_t |
hipEvent_t |
hc::completion_future |
concurrency::completion_future |
cl_event |
| Memory | void * |
void * |
void *; hc::array; hc::array_view |
concurrency::array;concurrency::array_view |
cl_mem |
| grid | grid | extent | extent | NDRange | |
| block | block | tile | tile | work-group | |
| thread | thread | thread | thread | work-item | |
| warp | warp | wavefront | N/A | sub-group | |
| Thread- index |
threadIdx.x | hipThreadIdx_x | t_idx.local[0] | t_idx.local[0] | get_local_id(0) |
| Block- index |
blockIdx.x | hipBlockIdx_x | t_idx.tile[0] | t_idx.tile[0] | get_group_id(0) |
| Block- dim |
blockDim.x | hipBlockDim_x | t_ext.tile_dim[0] | t_idx.tile_dim0 | get_local_size(0) |
| Grid-dim | gridDim.x | hipGridDim_x | t_ext[0] | t_ext[0] | get_global_size(0) |
| Device Kernel | __global__ |
__global__ |
lambda inside hc::parallel_for_each or [[hc]] |
restrict(amp) |
__kernel |
| Device Function | __device__ |
__device__ |
[[hc]] (detected automatically in many case) |
restrict(amp) |
Implied in device compilation |
| Host Function | __host_ (default) |
__host_ (default) |
[[cpu]] (default) |
restrict(cpu) (default) |
Implied in host compilation. |
| Host + Device Function | __host__ __device__ |
__host__ __device__ |
[[hc]] [[cpu]] |
restrict(amp,cpu) |
No equivalent |
| Kernel Launch | <<< >>> |
hipLaunchKernel |
hc::parallel_for_each |
concurrency::parallel_for_each |
clEnqueueNDRangeKernel |
| Global Memory | __global__ |
__global__ |
Unnecessary / Implied | Unnecessary / Implied | __global |
| Group Memory | __shared__ |
__shared__ |
tile_static |
tile_static |
__local |
| Constant | __constant__ |
__constant__ |
Unnecessary / Implied | Unnecessary / Implied | __constant |
__syncthreads |
__syncthreads |
tile_static.barrier() |
t_idx.barrier() |
barrier(CLK_LOCAL_MEMFENCE) |
|
| Atomic Builtins | atomicAdd |
atomicAdd |
hc::atomic_fetch_add |
concurrency::atomic_fetch_add |
atomic_add |
| Precise Math | cos(f) |
cos(f) |
hc::precise_math::cos(f) |
concurrency::precise_math::cos(f) |
cos(f) |
| Fast Math | __cos(f) |
__cos(f) |
hc::fast_math::cos(f) |
concurrency::fast_math::cos(f) |
native_cos(f) |
| Vector | float4 |
float4 |
hc::short_vector::float4 |
concurrency::graphics::float_4 |
float4 |
###Notes
- For HC and C++AMP, assume a captured tiled_ext named "t_ext" and captured extent named "ext". These languages use captured variables to pass information to the kernel rather than using special built-in functions so the exact variable name may vary.
- The indexing functions (starting with
thread-index) show the terminology for a 1D grid. Some APIs use reverse order of xyz / 012 indexing for 3D grids. - HC allows tile dimensions to be specified at runtime while C++AMP requires that tile dimensions be specified at compile-time. Thus hc syntax for tile dims is
t_ext.tile_dim[0]while C++AMP is t_ext.tile_dim0.