Manual Pinned Memory Management

Your code notes in the cuda preprocessing kernel that pinned memory is used for faster memcpy performance.

https://github.com/hamdiboukamcha/Yolo-V11-cpp-TensorRT/blob/988adf14f25d120fae4d971a0f2186cfe69b4e72/src/preprocess.cu#L130-L160

Pinned memory regions are required to copy data from host to device in CUDA. Normally, copying from virtual (normal) memory to the gpu requires CUDA, in the background, copying the data from the region of virtual memory to pinned memory and then from pinned memory to device.

This code does that manually and points to a general misunderstanding about how H2D transfers work in CUDA. I wanted to point this out.


	// Host function to perform CUDA-based preprocessing
	void cuda_preprocess(
	uint8_t* src, // Source image data on host
	int src_width, // Source image width
	int src_height, // Source image height
	float* dst, // Destination buffer on device
	int dst_width, // Destination image width
	int dst_height, // Destination image height
	cudaStream_t stream // CUDA stream for asynchronous execution
	) {
	// Calculate the size of the image in bytes (3 channels: BGR)
	int img_size = src_width * src_height * 3;

	// Copy source image data to pinned host memory for faster transfer
	memcpy(img_buffer_host, src, img_size);

	// Asynchronously copy image data from host to device memory
	CUDA_CHECK(cudaMemcpyAsync(
	img_buffer_device,
	img_buffer_host,
	img_size,
	cudaMemcpyHostToDevice,
	stream
	));

	// Define affine transformation matrices
	AffineMatrix s2d, d2s; // Source to destination and vice versa

	// Calculate the scaling factor to maintain aspect ratio
	float scale = std::min(

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Manual Pinned Memory Management #9

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Manual Pinned Memory Management #9

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions