-
Notifications
You must be signed in to change notification settings - Fork 74
Reuse existing cuda context if possible when creating decoders #263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -77,17 +77,40 @@ AVBufferRef* getFromCache(const torch::Device& device) { | |
| return nullptr; | ||
| } | ||
|
|
||
| AVBufferRef* getCudaContext(const torch::Device& device) { | ||
| enum AVHWDeviceType type = av_hwdevice_find_type_by_name("cuda"); | ||
| TORCH_CHECK(type != AV_HWDEVICE_TYPE_NONE, "Failed to find cuda device"); | ||
| torch::DeviceIndex deviceIndex = getFFMPEGCompatibleDeviceIndex(device); | ||
|
|
||
| AVBufferRef* hw_device_ctx = getFromCache(device); | ||
| if (hw_device_ctx != nullptr) { | ||
| return hw_device_ctx; | ||
| AVBufferRef* getFFMPEGContextFromExistingCudaContext( | ||
| const torch::Device& device, | ||
| torch::DeviceIndex nonNegativeDeviceIndex, | ||
| enum AVHWDeviceType type) { | ||
| c10::cuda::CUDAGuard deviceGuard(device); | ||
| // Valid values for the argument to cudaSetDevice are 0 to maxDevices - 1: | ||
| // https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DEVICE.html#group__CUDART__DEVICE_1g159587909ffa0791bbe4b40187a4c6bb | ||
| // So we ensure the deviceIndex is not negative. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry for the noob Q - where are we ensuring this?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The caller makes sure of that. The caller calls std::max on this deviceIndex. I'll rename this variable to be ffmpegCompatibleDeviceIndex so it's clear the max was already done. |
||
| // We set the device because we may be called from a different thread than | ||
| // the one that initialized the cuda context. | ||
| cudaSetDevice(nonNegativeDeviceIndex); | ||
| AVBufferRef* hw_device_ctx = nullptr; | ||
| std::string deviceOrdinal = std::to_string(nonNegativeDeviceIndex); | ||
| int err = av_hwdevice_ctx_create( | ||
| &hw_device_ctx, | ||
| type, | ||
| deviceOrdinal.c_str(), | ||
| nullptr, | ||
| AV_CUDA_USE_CURRENT_CONTEXT); | ||
| if (err < 0) { | ||
| TORCH_CHECK( | ||
| false, | ||
| "Failed to create specified HW device", | ||
| getFFMPEGErrorStringFromErrorCode(err)); | ||
| } | ||
| return hw_device_ctx; | ||
| } | ||
|
|
||
| std::string deviceOrdinal = std::to_string(deviceIndex); | ||
| AVBufferRef* getFFMPEGContextFromNewCudaContext( | ||
| const torch::Device& device, | ||
| torch::DeviceIndex nonNegativeDeviceIndex, | ||
| enum AVHWDeviceType type) { | ||
| AVBufferRef* hw_device_ctx = nullptr; | ||
| std::string deviceOrdinal = std::to_string(nonNegativeDeviceIndex); | ||
| int err = av_hwdevice_ctx_create( | ||
| &hw_device_ctx, type, deviceOrdinal.c_str(), nullptr, 0); | ||
| if (err < 0) { | ||
|
|
@@ -99,6 +122,32 @@ AVBufferRef* getCudaContext(const torch::Device& device) { | |
| return hw_device_ctx; | ||
| } | ||
|
|
||
| AVBufferRef* getCudaContext(const torch::Device& device) { | ||
| enum AVHWDeviceType type = av_hwdevice_find_type_by_name("cuda"); | ||
| TORCH_CHECK(type != AV_HWDEVICE_TYPE_NONE, "Failed to find cuda device"); | ||
| torch::DeviceIndex nonNegativeDeviceIndex = | ||
| getFFMPEGCompatibleDeviceIndex(device); | ||
|
|
||
| AVBufferRef* hw_device_ctx = getFromCache(device); | ||
| if (hw_device_ctx != nullptr) { | ||
| return hw_device_ctx; | ||
| } | ||
|
|
||
| // 58.26.100 introduced the concept of reusing the existing cuda context | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we clarify in the comment which major ffmpeg version
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was hesitant to put that in here because that could get stale. Different av* libraries get linked to different releases and there are minor releases too, but I added it here. It could potentially get stale |
||
| // which is much faster and lower memory than creating a new cuda context. | ||
| // So we try to use that if it is available. | ||
| // FFMPEG 6.1.2 appears to be the earliest release that contains version | ||
| // 58.26.100 of avutil. | ||
| // https://github.com/FFmpeg/FFmpeg/blob/4acb9b7d1046944345ae506165fb55883d04d8a6/doc/APIchanges#L265 | ||
| #if LIBAVUTIL_VERSION_INT >= AV_VERSION_INT(58, 26, 100) | ||
| return getFFMPEGContextFromExistingCudaContext( | ||
| device, nonNegativeDeviceIndex, type); | ||
| #else | ||
| return getFFMPEGContextFromNewCudaContext( | ||
| device, nonNegativeDeviceIndex, type); | ||
| #endif | ||
| } | ||
|
|
||
| torch::Tensor allocateDeviceTensor( | ||
| at::IntArrayRef shape, | ||
| torch::Device device, | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my own understanding, are there existing docs (from ffmpeg or nvidia) that explain why
deviceGuard()andcudaSetDevice()are needed?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DEVICE.html#group__CUDART__DEVICE_1g159587909ffa0791bbe4b40187a4c6bb tells you about cudaSetDevice
As to why it's needed, a context isn't available in a secondary thread so we make it available there before trying to reuse it