You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
and release it via `hipFree <https://rocm.docs.amd.com/projects/HIP/en/latest/.doxygen/docBin/html/group___memory.html#ga740d08da65cae1441ba32f8fedb863d1>`_.
470
+
471
+
Predefined Macros
472
+
=================
473
+
474
+
.. list-table::
475
+
:header-rows: 1
476
+
477
+
* - Macro
478
+
- Description
479
+
* - ``__HIPSTDPAR__``
480
+
- Defined when Clang is compiling code in algorithm offload mode, enabled
481
+
with the ``--hipstdpar`` compiler option.
482
+
* - ``__HIPSTDPAR_INTERPOSE_ALLOC__``
483
+
- Defined only when compiling in algorithm offload mode, when the user
484
+
enables interposition mode with the ``--hipstdpar-interpose-alloc``
485
+
compiler option, indicating that all dynamic memory allocation /
486
+
deallocation functions should be replaced with accelerator aware
487
+
variants.
488
+
489
+
Restrictions
490
+
============
491
+
492
+
We define two modes in which runtime execution can occur:
493
+
494
+
1. **HMM Mode** - this assumes that the
495
+
`HMM <https://docs.kernel.org/mm/hmm.html>`_ subsystem of the Linux kernel
496
+
is used to provide transparent on-demand paging i.e. memory obtained from a
497
+
system / OS allocator such as via a call to ``malloc`` or ``operator new`` is
498
+
directly accessible to the accelerator and it follows the C++ memory model;
499
+
2. **Interposition Mode** - this is a fallback mode for cases where transparent
500
+
on-demand paging is unavailable (e.g. in the Windows OS), which means that
501
+
memory must be allocated via an accelerator aware mechanism, and system
502
+
allocated memory is inaccessible for the accelerator.
503
+
504
+
The following restrictions imposed on user code apply to both modes:
505
+
506
+
1. Pointers to function, and all associated features, such as e.g. dynamic
507
+
polymorphism, cannot be used (directly or transitively) by the user provided
`HIP kernel language <https://rocm.docs.amd.com/projects/HIP/en/latest/reference/kernel_language.html>`_;
604
+
whilst things like using `__device__` annotations might accidentally "work",
605
+
they are not guaranteed to, and thus cannot be relied upon by user code;
606
+
- A consequence of the above is that both bitcode linking and linking
607
+
relocatable object files will "work", but it is not guaranteed to remain
608
+
working or actively tested at the moment; this restriction might be relaxed
609
+
in the future.
610
+
2. Combining explicit HIP, CUDA or OpenMP Offload compilation with
611
+
``--hipstdpar`` based offloading is not allowed or supported in any way.
612
+
3. There is no way to target different accelerators via a standard algorithm
613
+
invocation (`this might be addressed in future C++ standards <https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2500r1.html>`_);
614
+
an unsafe (per the point above) way of achieving this is to spawn new threads
615
+
and invoke the `hipSetDevice <https://rocm.docs.amd.com/projects/HIP/en/latest/.doxygen/docBin/html/group___device.html#ga43c1e7f15925eeb762195ccb5e063eae>`_
616
+
interface e.g.:
617
+
618
+
.. code-block:: c++
619
+
620
+
int accelerator_0 = ...;
621
+
int accelerator_1 = ...;
622
+
623
+
bool multiple_accelerators(const std::vector<int>& u, const std::vector<int>& v) {
624
+
std::atomic<unsigned int> r{0u};
625
+
626
+
thread t0{[&]() {
627
+
hipSetDevice(accelerator_0);
628
+
629
+
r += std::count(std::execution::par_unseq, std::cbegin(u), std::cend(u), 42);
630
+
}};
631
+
thread t1{[&]() {
632
+
hitSetDevice(accelerator_1);
633
+
634
+
r += std::count(std::execution::par_unseq, std::cbegin(v), std::cend(v), 314152)
635
+
}};
636
+
637
+
t0.join();
638
+
t1.join();
639
+
640
+
return r;
641
+
}
642
+
643
+
Note that this is a temporary, unsafe workaround for a deficiency in the C++
644
+
Standard.
645
+
646
+
Open Questions / Future Developments
647
+
====================================
648
+
649
+
1. The restriction on the use of global / namespace scope / ``static`` /
650
+
``thread`` storage duration variables in offloaded algorithms will be lifted
651
+
in the future, when running in **HMM Mode**;
652
+
2. The restriction on the use of dynamic memory allocation in offloaded
653
+
algorithms will be lifted in the future.
654
+
3. The restriction on the use of pointers to function, and associated features
655
+
such as dynamic polymorphism might be lifted in the future, when running in
656
+
**HMM Mode**;
657
+
4. Offload support might be extended to cases where the ``parallel_policy`` is
0 commit comments