Skip to content

Conversation

@Sy0307
Copy link
Collaborator

@Sy0307 Sy0307 commented Jan 3, 2026

Implement #500.

  • Enabled by: -DBPFTIME_ENABLE_GDRCOPY=ON
  • Runtime controlled via: agent_config.enable_gpu_gdrcopy, with transparent fallback to cuMemcpyDtoH on failure.

Features

  • Implement hybrid lookup policy: For BPF_MAP_TYPE_GPU_ARRAY_MAP and BPF_MAP_TYPE_PERGPUTD_ARRAY_MAP, elem_lookup tries copy_from_device_to_host_with_gdrcopy(...) first, otherwise falls back to cuMemcpyDtoH.
  • Add per-key size threshold: agent_config.gpu_gdrcopy_max_per_key_bytes (default DEFAULT_GPU_GDRCOPY_MAX_PER_KEY_BYTES=4096); skip GDRCopy when per-key bytes exceed the threshold.
  • Dynamic loading + fallback: Uses libgdrapi.so.2 + dlsym; missing libgdrapi.so or /dev/gdrdrv does not break functionality.

How to test

Build

cmake -S . -B build -G Ninja -DBPFTIME_ENABLE_CUDA_ATTACH=ON -DBPFTIME_CUDA_ROOT=/usr/local/cuda -DBPFTIME_ENABLE_GDRCOPY=ON
cmake --build build -j --target gpu_array_map_host_perf gpu_per_thread_array_map_host_perf

Run comparisons

Baseline:

./build/benchmark/gpu/host/gpu_array_map_host_perf --iters 50000 --max-entries 1024 --value-size 8 --gdrcopy 0

GDRCopy:

./build/benchmark/gpu/host/gpu_array_map_host_perf --iters 50000 --max-entries 1024 --value-size 8 --gdrcopy 1 --gdrcopy-max-per-key-bytes 4096

PERGPUTD baseline:

./build/benchmark/gpu/host/gpu_per_thread_array_map_host_perf --iters 50000 --max-entries 1024 --value-size 8 --thread-count 32 --gdrcopy 0

PERGPUTD GDRCopy:

./build/benchmark/gpu/host/gpu_per_thread_array_map_host_perf --iters 50000 --max-entries 1024 --value-size 8 --thread-count 32 --gdrcopy 1 --gdrcopy-max-per-key-bytes 4096

namespace
{
constexpr const char *kGdrcopyLibraryNames[] = { "libgdrapi.so",
"libgdrapi.so.2", nullptr };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the nullptr necessary? We may obtain the length of an array by std::size

#else
// Minimal GDRCopy type declarations to allow dynamic loading via dlopen
// without requiring the gdrapi.h header at build time.
struct gdr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These declarations should be put in a seperate header file, I think

@Officeyutong Officeyutong merged commit f1a3387 into eunomia-bpf:master Jan 22, 2026
151 of 152 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants