[+] Feat: Implement gdrcopy support for GPU maps #536
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Implement #500.
-DBPFTIME_ENABLE_GDRCOPY=ONagent_config.enable_gpu_gdrcopy, with transparent fallback tocuMemcpyDtoHon failure.Features
BPF_MAP_TYPE_GPU_ARRAY_MAPandBPF_MAP_TYPE_PERGPUTD_ARRAY_MAP,elem_lookuptriescopy_from_device_to_host_with_gdrcopy(...)first, otherwise falls back tocuMemcpyDtoH.agent_config.gpu_gdrcopy_max_per_key_bytes(defaultDEFAULT_GPU_GDRCOPY_MAX_PER_KEY_BYTES=4096); skip GDRCopy when per-key bytes exceed the threshold.libgdrapi.so.2+dlsym; missinglibgdrapi.soor/dev/gdrdrvdoes not break functionality.How to test
Build
cmake -S . -B build -G Ninja -DBPFTIME_ENABLE_CUDA_ATTACH=ON -DBPFTIME_CUDA_ROOT=/usr/local/cuda -DBPFTIME_ENABLE_GDRCOPY=ON cmake --build build -j --target gpu_array_map_host_perf gpu_per_thread_array_map_host_perfRun comparisons
Baseline:
GDRCopy:
PERGPUTD baseline:
PERGPUTD GDRCopy: