Skip to content

Commit c78108e

Browse files
committed
Enable implicit host-to-device copy for EDProducers
1 parent d16e4a0 commit c78108e

File tree

7 files changed

+205
-27
lines changed

7 files changed

+205
-27
lines changed

HeterogeneousCore/AlpakaCore/README.md

Lines changed: 36 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -58,53 +58,52 @@ Note that even if for Event data formats the examples above used `DataFormats` p
5858

5959
### Implicit data transfers
6060

61-
Both EDProducers and ESProducers make use of implicit data transfers.
61+
Both EDProducers and ESProducers make use of implicit data transfers. In CPU backends these data transfers are omitted, and the host-side and the "device-side" data products are the same.
6262

63-
#### EDProducer
63+
#### Data copy definitions
6464

65-
In EDProducers for each device-side data product a transfer from the device memory space to the host memory space is registered automatically. The data product is copied only if the job has another EDModule that consumes the host-side data product. The framework code to issue the transfer makes use of `cms::alpakatools::CopyToHost` class template that must be specialized along
65+
The implicit host-to-device and device-to-host copies rely on specialization of `cms::alpakatools::CopyToDevice` and `cms::alpakatools::CopyToHost` class templates, respectively. These have to be specialized along
6666
```cpp
67-
#include "HeterogeneousCore/AlpakaInterface/interface/CopyToHost.h"
67+
#include "HeterogeneousCore/AlpakaInterface/interface/CopyToDevice.h"
6868

6969
namespace cms::alpakatools {
70-
template <>
71-
struct CopyToHost<TSrc> {
70+
template<>
71+
struct CopyToDevice<TSrc> {
7272
template <typename TQueue>
73-
static auto copyAsync(TQueue& queue, TSrc const& deviceProduct) -> TDst {
74-
// code to construct TDst object, and launch the asynchronous memcpy from the device of TQueue to the host
73+
requires alpaka::isQueue<TQueue>
74+
static auto copyAsync(TQueue& queue, TSrc const& hostProduct) -> TDst {
75+
// code to construct TDst object, and launch the asynchronous memcpy from the host to the device of TQueue
7576
return ...;
7677
}
7778
};
7879
}
7980
```
80-
Note that the destination (host-side) type `TDst` can be different from or the same as the source (device-side) type `TSrc` as far as the framework is concerned. For example, in the `PortableCollection` model the types are different. The `copyAsync()` member function is easiest to implement as a template over `TQueue`. The framework handles the necessary synchronization between the copy function and the consumer in a non-blocking way.
81-
82-
The `CopyToHost` class template is partially specialized for all `PortableCollection` instantiations.
83-
84-
#### ESProducer
85-
86-
In ESProducers for each host-side data product a transfer from the host memory space to the device memory space (of the backend of the ESProducer) is registered automatically. The data product is copied only if the job has another ESProducer or EDModule that consumes the device-side data product. The framework code to issue makes use of `cms::alpakatools::CopyToDevice` class template that must be specialized along
81+
or
8782
```cpp
88-
#include "HeterogeneousCore/AlpakaInterface/interface/CopyToDevice.h"
83+
#include "HeterogeneousCore/AlpakaInterface/interface/CopyToHost.h"
8984
9085
namespace cms::alpakatools {
91-
template<>
92-
struct CopyToDevice<TSrc> {
86+
template <>
87+
struct CopyToHost<TSrc> {
9388
template <typename TQueue>
94-
static auto copyAsync(TQueue& queue, TSrc const& hostProduct) -> TDst {
95-
// code to construct TDst object, and launch the asynchronous memcpy from the host to the device of TQueue
89+
requires alpaka::isQueue<TQueue>
90+
static auto copyAsync(TQueue& queue, TSrc const& deviceProduct) -> TDst {
91+
// code to construct TDst object, and launch the asynchronous memcpy from the device of TQueue to the host
9692
return ...;
9793
}
9894
};
9995
}
10096
```
101-
Note that the destination (device-side) type `TDst` can be different from or the same as the source (host-side) type `TSrc` as far as the framework is concerned. For example, in the `PortableCollection` model the types are different. The `copyAsync()` member function is easiest to implement as a template over `TQueue`. The framework handles the necessary synchronization between the copy function and the consumer in a non-blocking way.
97+
respectively.
98+
99+
Note that the destination (device-side/host-side) type `TDst` can be different from or the same as the source (host-side/device-side) type `TSrc` as far as the framework is concerned. For example, in the `PortableCollection` model the types are different. The `copyAsync()` member function is easiest to implement as a template over `TQueue`. The framework handles the necessary synchronization between the copy function and the consumer in a non-blocking way.
100+
101+
Both `CopyToDevice` and `CopyToHost` class templates are partially specialized for all `PortableObject` and `PortableCollection` instantiations.
102102

103-
The `CopyToDevice` class template is partially specialized for all `PortableCollection` instantiations.
104103

105-
#### Data products with `memcpy()`ed pointers
104+
##### Data products with `memcpy()`ed pointers
106105

107-
If the data product in question contains pointers to memory elsewhere within the data product, after the `alpaka::memcpy()` calls in the `copyAsync()` those pointers still point to device memory, and need to be updated. **Such data products are generally discouraged.** Nevertheless, such pointers can be updated without any additional synchronization by implementing a `postCopy()` function in the `CopyToHost` specialization along (extending the `CopyToHost` example [above](#edproducer))
106+
If the data product in question contains pointers to memory elsewhere within the data product, after the `alpaka::memcpy()` calls in the `copyAsync()` those pointers still point to device memory, and need to be updated. **Such data products are generally discouraged.** Nevertheless, such pointers can be updated without any additional synchronization by implementing a `postCopy()` function in the `CopyToHost` specialization along (extending the `CopyToHost` example [above](#data-copy-definitions))
108107
```cpp
109108
namespace cms::alpakatools {
110109
template <>
@@ -121,7 +120,18 @@ namespace cms::alpakatools {
121120
```
122121
The `postCopy()` is called after the operations enqueued in the `copyAsync()` have finished. The code in `postCopy()` must be such that the call to `postCopy()` can be omitted on CPU backends.
123122

124-
Note that for `CopyToDevice` such `postCopy()` functionality is **not** provided. It should be possible to a issue kernel call (via an intermediate host-side function) from the `CopyToDevice::copyAsync()` function to achieve the same effect.
123+
Note that for `CopyToDevice` such `postCopy()` functionality is **not** provided. It should be possible to a issue kernel call from the `CopyToDevice::copyAsync()` function to achieve the same effect.
124+
125+
126+
#### EDProducer
127+
128+
In EDProducers for each device-side data product a transfer from the device memory space to the host memory space is registered automatically. The data product is copied only if the job has another EDModule that consumes the host-side data product. For each device-side data product a specialization of `cms::alpakatools::CopyToHost` is required to exist.
129+
130+
In addition, for each host-side data product a transfer from the host memory space to the device meory space is registered autmatically **if** a `cms::alpakatools::CopyToDevice` specialization exists. The data product is copied only if the job has another EDModule that consumes the device-side data product.
131+
132+
#### ESProducer
133+
134+
In ESProducers for each host-side data product a transfer from the host memory space to the device memory space (of the backend of the ESProducer) is registered automatically. The data product is copied only if the job has another ESProducer or EDModule that consumes the device-side data product. For each host-side data product a specialization of `cms::alpakatools::CopyToDevice` is required to exist.
125135

126136
### `PortableCollection`
127137

@@ -157,7 +167,7 @@ Note that currently Alpaka-based ESSources are not supported. If you need to pro
157167

158168
The Alpaka-based modules have a notion of a _host memory space_ and _device memory space_ for the Event and EventSetup data products. The data products in the host memory space are accessible for non-Alpaka modules, whereas the data products in device memory space are available only for modules of the specific Alpaka backend. The host backend(s) use the host memory space directly.
159169

160-
The EDModules get `device::Event` and `device::EventSetup` from the framework, from which data products in both host memory space and device memory space can be accessed. Data products can also be produced to either memory space. For all data products produced in the device memory space an implicit data copy from the device memory space to the host memory space is registered as discussed above. The `device::Event::queue()` returns the Alpaka `Queue` object into which all work in the EDModule must be enqueued.
170+
The EDModules get `device::Event` and `device::EventSetup` from the framework, from which data products in both host memory space and device memory space can be accessed. Data products can also be produced to either memory space. As discussed [above](#edproducer), for each data product produced into the device memory space an implicit data copy from the device memory space to the host memory space is registered, and for each data produced produced into the host memory space for which `cms::alpakatools::CopyToDevice` is specialized an implicit data copy from the host memory space to the device memory space is registered. The `device::Event::queue()` returns the Alpaka `Queue` object into which all work in the EDModule must be enqueued.
161171

162172
The ESProducer can have two different `produce()` function signatures
163173
* If the function has the usual `TRecord const&` parameter, the function can read an ESProduct from the host memory space, and produce another product into the host memory space. An implicit copy of the data product from the host memory space to the device memory space (of the backend of the ESProducer) is registered as discussed above.

HeterogeneousCore/AlpakaCore/interface/alpaka/ProducerBase.h

Lines changed: 30 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212
#include "HeterogeneousCore/AlpakaCore/interface/alpaka/EDMetadataAcquireSentry.h"
1313
#include "HeterogeneousCore/AlpakaCore/interface/modulePrevalidate.h"
1414
#include "HeterogeneousCore/AlpakaInterface/interface/Backend.h"
15+
#include "HeterogeneousCore/AlpakaInterface/interface/CopyToDevice.h"
1516
#include "HeterogeneousCore/AlpakaInterface/interface/CopyToHost.h"
1617

1718
#include <memory>
@@ -90,7 +91,35 @@ namespace ALPAKA_ACCELERATOR_NAMESPACE {
9091
// can think of it later if really needed
9192
template <typename TProduct, edm::Transition Tr>
9293
edm::EDPutTokenT<TProduct> produces(std::string instanceName) {
93-
return Base::template produces<TProduct, Tr>(std::move(instanceName));
94+
constexpr bool hasCopy = requires(Queue& queue, TProduct const& prod) {
95+
cms::alpakatools::CopyToDevice<TProduct>::copyAsync(queue, prod);
96+
};
97+
98+
if constexpr (detail::useProductDirectly or not hasCopy) {
99+
return Base::template produces<TProduct, Tr>(std::move(instanceName));
100+
} else {
101+
edm::EDPutTokenT<TProduct> hostToken = Base::template produces<TProduct, Tr>(instanceName);
102+
this->registerTransformAsync(
103+
hostToken,
104+
[synchronize = this->synchronize()](
105+
edm::StreamID streamID, TProduct const& hostProduct, edm::WaitingTaskWithArenaHolder holder) {
106+
detail::EDMetadataAcquireSentry sentry(streamID, std::move(holder), synchronize);
107+
using CopyT = cms::alpakatools::CopyToDevice<TProduct>;
108+
auto productOnDevice = CopyT::copyAsync(sentry.metadata()->queue(), hostProduct);
109+
// Need to keep the EDMetadata object from sentry.finish()
110+
// alive until the synchronization
111+
using TplType = std::tuple<std::shared_ptr<EDMetadata>, decltype(productOnDevice)>;
112+
// Wrap possibly move-only type into a copyable type
113+
return std::make_shared<TplType>(sentry.finish(), std::move(productOnDevice));
114+
},
115+
[](edm::StreamID, auto tplPtr) {
116+
using DeviceObject = std::tuple_element_t<1, std::remove_cvref_t<decltype(*tplPtr)>>;
117+
using DeviceProductType = detail::DeviceProductType<DeviceObject>;
118+
return DeviceProductType(std::move(std::get<0>(*tplPtr)), std::move(std::get<1>(*tplPtr)));
119+
},
120+
std::move(instanceName));
121+
return hostToken;
122+
}
94123
}
95124

96125
// Device products
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
#include "DataFormats/Portable/interface/PortableObject.h"
2+
#include "DataFormats/PortableTestObjects/interface/TestHostObject.h"
3+
#include "FWCore/ParameterSet/interface/ConfigurationDescriptions.h"
4+
#include "FWCore/ParameterSet/interface/ParameterSetDescription.h"
5+
#include "FWCore/Utilities/interface/EDPutToken.h"
6+
#include "HeterogeneousCore/AlpakaCore/interface/alpaka/global/EDProducer.h"
7+
#include "HeterogeneousCore/AlpakaInterface/interface/config.h"
8+
#include "HeterogeneousCore/AlpakaInterface/interface/host.h"
9+
10+
namespace ALPAKA_ACCELERATOR_NAMESPACE {
11+
/**
12+
* This class demonstrates a global EDProducer that
13+
* - produces a host-side EDProduct that is copied to device automatically
14+
*/
15+
class TestAlpakaGlobalProducerImplicitCopyToDevice : public global::EDProducer<> {
16+
public:
17+
TestAlpakaGlobalProducerImplicitCopyToDevice(edm::ParameterSet const& config)
18+
: EDProducer<>(config), putToken_{produces()}, putTokenInstance_{produces("instance")} {}
19+
20+
void produce(edm::StreamID, device::Event& iEvent, device::EventSetup const& iSetup) const override {
21+
portabletest::TestStruct test{6., 14., 15., 52};
22+
iEvent.emplace(putToken_, cms::alpakatools::host(), test);
23+
iEvent.emplace(putTokenInstance_, cms::alpakatools::host(), test);
24+
}
25+
26+
static void fillDescriptions(edm::ConfigurationDescriptions& descriptions) {
27+
edm::ParameterSetDescription desc;
28+
descriptions.addWithDefaultLabel(desc);
29+
}
30+
31+
private:
32+
const edm::EDPutTokenT<portabletest::TestHostObject> putToken_;
33+
const edm::EDPutTokenT<portabletest::TestHostObject> putTokenInstance_;
34+
};
35+
36+
} // namespace ALPAKA_ACCELERATOR_NAMESPACE
37+
38+
#include "HeterogeneousCore/AlpakaCore/interface/alpaka/MakerMacros.h"
39+
DEFINE_FWK_ALPAKA_MODULE(TestAlpakaGlobalProducerImplicitCopyToDevice);
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
#include "DataFormats/PortableTestObjects/interface/alpaka/TestDeviceObject.h"
2+
#include "FWCore/ParameterSet/interface/ConfigurationDescriptions.h"
3+
#include "FWCore/ParameterSet/interface/ParameterSet.h"
4+
#include "FWCore/ParameterSet/interface/ParameterSetDescription.h"
5+
#include "FWCore/Utilities/interface/EDPutToken.h"
6+
#include "FWCore/Utilities/interface/Exception.h"
7+
#include "FWCore/Utilities/interface/InputTag.h"
8+
#include "HeterogeneousCore/AlpakaCore/interface/alpaka/EDGetToken.h"
9+
#include "HeterogeneousCore/AlpakaCore/interface/alpaka/stream/SynchronizingEDProducer.h"
10+
#include "HeterogeneousCore/AlpakaInterface/interface/config.h"
11+
#include "HeterogeneousCore/AlpakaInterface/interface/host.h"
12+
#include "HeterogeneousCore/AlpakaInterface/interface/memory.h"
13+
14+
#include "verifyDeviceObjectAsync.h"
15+
16+
namespace ALPAKA_ACCELERATOR_NAMESPACE {
17+
class TestAlpakaVerifyObjectOnDevice : public stream::SynchronizingEDProducer<> {
18+
public:
19+
TestAlpakaVerifyObjectOnDevice(edm::ParameterSet const& config)
20+
: SynchronizingEDProducer<>(config),
21+
getToken_{consumes(config.getParameter<edm::InputTag>("source"))},
22+
putToken_{produces()} {}
23+
24+
void acquire(device::Event const& iEvent, device::EventSetup const& iSetup) override {
25+
auto const& deviceObject = iEvent.get(getToken_);
26+
succeeded_ = verifyDeviceObjectAsync(iEvent.queue(), deviceObject);
27+
}
28+
29+
void produce(device::Event& iEvent, device::EventSetup const& iSetup) override {
30+
if (not **succeeded_) {
31+
throw cms::Exception("Assert") << "Device object verification failed";
32+
}
33+
iEvent.emplace(putToken_, true);
34+
}
35+
36+
static void fillDescriptions(edm::ConfigurationDescriptions& descriptions) {
37+
edm::ParameterSetDescription desc;
38+
desc.add<edm::InputTag>("source");
39+
descriptions.addWithDefaultLabel(desc);
40+
}
41+
42+
private:
43+
const device::EDGetToken<portabletest::TestDeviceObject> getToken_;
44+
const edm::EDPutTokenT<bool> putToken_;
45+
std::optional<cms::alpakatools::host_buffer<bool>> succeeded_;
46+
};
47+
48+
} // namespace ALPAKA_ACCELERATOR_NAMESPACE
49+
50+
#include "HeterogeneousCore/AlpakaCore/interface/alpaka/MakerMacros.h"
51+
DEFINE_FWK_ALPAKA_MODULE(TestAlpakaVerifyObjectOnDevice);
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
#include "HeterogeneousCore/AlpakaInterface/interface/workdivision.h"
2+
3+
#include "verifyDeviceObjectAsync.h"
4+
5+
namespace ALPAKA_ACCELERATOR_NAMESPACE {
6+
cms::alpakatools::host_buffer<bool> verifyDeviceObjectAsync(Queue& queue,
7+
portabletest::TestDeviceObject const& deviceObject) {
8+
auto tmp = cms::alpakatools::make_device_buffer<bool>(queue);
9+
alpaka::exec<Acc1D>(
10+
queue,
11+
cms::alpakatools::make_workdiv<Acc1D>(1, 1),
12+
[] ALPAKA_FN_ACC(Acc1D const& acc, portabletest::TestStruct const* obj, bool* result) {
13+
if (cms::alpakatools::once_per_grid(acc)) {
14+
*result = (obj->x == 6. and obj->y == 14. and obj->z == 15. and obj->id == 52);
15+
}
16+
},
17+
deviceObject.data(),
18+
tmp.data());
19+
auto ret = cms::alpakatools::make_host_buffer<bool>(queue);
20+
alpaka::memcpy(queue, ret, tmp);
21+
return ret;
22+
}
23+
} // namespace ALPAKA_ACCELERATOR_NAMESPACE
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
#ifndef HeterogeneousCore_AlpakaTest_plugins_alpaka_verifyDeviceObjectAsync_h
2+
#define HeterogeneousCore_AlpakaTest_plugins_alpaka_verifyDeviceObjectAsync_h
3+
4+
#include "DataFormats/PortableTestObjects/interface/alpaka/TestDeviceObject.h"
5+
#include "HeterogeneousCore/AlpakaInterface/interface/config.h"
6+
#include "HeterogeneousCore/AlpakaInterface/interface/memory.h"
7+
8+
namespace ALPAKA_ACCELERATOR_NAMESPACE {
9+
cms::alpakatools::host_buffer<bool> verifyDeviceObjectAsync(Queue& queue,
10+
portabletest::TestDeviceObject const& deviceObject);
11+
}
12+
13+
#endif

0 commit comments

Comments
 (0)