Skip to content

Manual Pinned Memory Management #9

@wkaisertexas

Description

@wkaisertexas

Your code notes in the cuda preprocessing kernel that pinned memory is used for faster memcpy performance.

// Host function to perform CUDA-based preprocessing
void cuda_preprocess(
uint8_t* src, // Source image data on host
int src_width, // Source image width
int src_height, // Source image height
float* dst, // Destination buffer on device
int dst_width, // Destination image width
int dst_height, // Destination image height
cudaStream_t stream // CUDA stream for asynchronous execution
) {
// Calculate the size of the image in bytes (3 channels: BGR)
int img_size = src_width * src_height * 3;
// Copy source image data to pinned host memory for faster transfer
memcpy(img_buffer_host, src, img_size);
// Asynchronously copy image data from host to device memory
CUDA_CHECK(cudaMemcpyAsync(
img_buffer_device,
img_buffer_host,
img_size,
cudaMemcpyHostToDevice,
stream
));
// Define affine transformation matrices
AffineMatrix s2d, d2s; // Source to destination and vice versa
// Calculate the scaling factor to maintain aspect ratio
float scale = std::min(

Pinned memory regions are required to copy data from host to device in CUDA. Normally, copying from virtual (normal) memory to the gpu requires CUDA, in the background, copying the data from the region of virtual memory to pinned memory and then from pinned memory to device.

This code does that manually and points to a general misunderstanding about how H2D transfers work in CUDA. I wanted to point this out.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions