This library provides data structures to ease programming in CUDA (version 12 or higher). For a tutorial and further information, please read this manual.
Quick example on how to transfer a std::vector on CPU to a battery::vector on GPU (notice you don't need to do any manual memory allocation or deallocation):
#include <vector>
#include "battery/vector.hpp"
#include "battery/unique_ptr.hpp"
#include "battery/allocator.hpp"
using mvector = battery::vector<int, battery::managed_allocator>;
__global__ void kernel(mvector* v_ptr) {
mvector& v = *v_ptr;
// ... Compute on `v` in parallel.
}
int main(int argc, char** argv) {
std::vector<int> v(10000, 42);
// Transfer from CPU vector to GPU vector.
auto gpu_v = battery::make_unique<mvector, battery::managed_allocator>(v);
kernel<<<256, 256>>>(gpu_v.get());
CUDAEX(cudaDeviceSynchronize());
// Transfering the new data to the initial vector.
for(int i = 0; i < v.size(); ++i) {
v[i] = (*gpu_v)[i];
}
return 0;
}- How to transfer data from the CPU to the GPU?
- How to create a CMake project for CUDA project?
- How to allocate a vector shared by all threads of a block inside a kernel?
- How to allocate a vector shared by all blocks inside a kernel?
- CUDA runtime error an illegal memory access was encountered
- How to allocate a vector in shared memory?
- Namespace:
battery::*. - The documentation is not exhaustive (which is why we provide a link to the standard C++ STL documentation), but we document most of the main differences and the features without a standard counterpart.
- The table below is a quick reference to the most useful features, but it is not exhaustive.
- The structures provided here are not thread-safe, this responsibility is delegated to the user of this library.
| Category | Main features | |||
|---|---|---|---|---|
| Allocator | standard_allocator |
global_allocator |
managed_allocator |
pool_allocator |
| Pointers | shared_ptr (std) |
make_shared (std) |
allocate_shared (std) |
|
unique_ptr (std) |
make_unique (std) |
make_unique_block |
make_unique_grid |
|
| Containers | vector (std) |
string (std) |
dynamic_bitset |
|
tuple |
variant (std) |
bitset (std) |
||
| Utility | CUDA |
INLINE |
CUDAE |
CUDAEX |
limits |
ru_cast |
rd_cast |
||
popcount (std) |
countl_zero (std) |
countl_one (std) |
countr_zero (std) |
|
countr_one (std) |
signum |
ipow |
||
add_up |
add_down |
sub_up |
sub_down |
|
mul_up |
mul_down |
div_up |
div_down |
|
| Memory | local_memory |
read_only_memory |
atomic_memory |
|
atomic_scoped_memory |
atomic_memory_block |
atomic_memory_grid |