-
Notifications
You must be signed in to change notification settings - Fork 4
Home
Luca Parisi edited this page Aug 15, 2024
·
9 revisions
- Parallelisation over SMs ( teams )
- Parallelisation over warps ( parallel )
- Data Transfers ( map directives )
#pragma omp declare
- Custom mappers ( #1 )
- mapping of a class containing pointers
- Custom memory allocators ( #4 )
- Memory allocations allocation per thread ( firstprivate) vs shared variables ( may live in shared memory/global memory/local memory vs being passed as a kernel argument )
- Pinned memory allocation on CPU
- shared memory allocations
- Concurrency
- submit kernels from multiple threads (#3). Demonstrates using different cudaStreams/hipStreams.
- use openmp tasking (#7). Demonstrates overlapping memory transfers and execution or multiple small kernels which might be difficult to merge in one bigger kernel.
- Interoperability ( #2 )
- Dementrate how to use cuda with a variable mapped from openmp and how to use a variable allocated from cuda in openmp.
- example of using cuFFT ( or any cuda/rocm numerical library ) together with openmp
- OPenMP generic( CPU ) mode
- Occupancy
- Memory bandwith ( global/shared ), roofline plot
- Coalesced access ( global memory )
- Bank Conflicts ( shared memory )
- Naive implementation [ teams only, teams + parallel ]
- Add mapping directives to control memory transfers
- Use a custom mapper for transferring the whole class
- Split up in subdomains for overlapping transfer and computation ( use streams )
- Use shared memory to improve performance bandwidth ( custom memory allocators ? )