Skip to content

Transition VMAF to use nv-codec-headers instead of driver linking#1436

Merged
kylophone merged 1 commit intoNetflix:masterfrom
gedoensmax:nv_codec_headers
Oct 20, 2025
Merged

Transition VMAF to use nv-codec-headers instead of driver linking#1436
kylophone merged 1 commit intoNetflix:masterfrom
gedoensmax:nv_codec_headers

Conversation

@gedoensmax
Copy link
Contributor

This still requires NVCC to be installed for now but it would at least enable shipping a CUDA prebuilt library with ffmpeg as default I believe as the CUDA driver is dynamically loaded.

@kylophone Would it be possible to build this setup in the current CI ?

@BtbN to dynamically load the cuda driver I relied on your nv-codec-headers, but these are missing some driver functions we have been using in VMAF. Would it be possible to add these ?

cuCtxSynchronize
cuCtxGetStreamPriorityRange
cuStreamCreateWithPriority
cuMemHostAlloc
cuMemFreeHost
cuMemAllocAsync // not used yet, but it's counter part
cuMemFreeAsync
cuLaunchHostFunc

While there is no graph usage so far it would be great to have CUDA graph support as well, but these seem to be quite a few functions.

@gedoensmax gedoensmax changed the title Transition vamf to use nv-codec-headers instead of driver linking Transition VMAF to use nv-codec-headers instead of driver linking Sep 2, 2025
@BtbN
Copy link

BtbN commented Sep 2, 2025

You can PR them to https://code.ffmpeg.org/FFmpeg/nv-codec-headers if you like, adding new driver functions is never a huge deal.

You can most likely also get rid of the nvcc dependency by just using clang, like FFmpeg does. Practically every distribution of clang comes with nvptx support. You'll need to build a small header that defines all the interfaces you use though, but usually it's not all that many.

@gedoensmax
Copy link
Contributor Author

@BtbN would it be possible for you to help me with these changes ? For me this contribution has to go through legal etc., I am working with the Aachen DevTech team :)

@BtbN
Copy link

BtbN commented Sep 2, 2025

Something I just noticed: You are still using cudart, aren't you?
That in itself will always pull in CUDA directly, I think, so you dynamically loading the driver API does not remove the runtime dependency. Or am I missing something there?

@gedoensmax
Copy link
Contributor Author

I am only querying for cudart to get the path to nvcc. But yes you are right I forgot to delete -lcuda that being said it was not used anyway.
ldd at least tells me there is no CUDA required. But I have not tested on a CPU machine yet.

@BtbN
Copy link

BtbN commented Sep 2, 2025

Interesting, I'd have thought that adding cuda_rt_api_dependency to cuda_dependency, which ultimately ends up in the dependencies of the lib, would link against it. Might have misread the meson script then.

@gedoensmax
Copy link
Contributor Author

I do not understand meson very well :D I just relied on ldd to be honest.

@kylophone
Copy link
Collaborator

@BtbN

You can PR them to https://code.ffmpeg.org/FFmpeg/nv-codec-headers if you like, adding new driver functions is never a huge deal.

Made a PR here: https://code.ffmpeg.org/FFmpeg/nv-codec-headers/pulls/1. Let's review over on Forgejo.

@gedoensmax Could you please test your libvmaf changes with the updated nv-codec-headers and let me know if everything is working?

@gedoensmax
Copy link
Contributor Author

@kylophone I adopted the remaining functions from your patch - thanks for the quick turnaround.

@BtbN Also thanks for noting that there was still some dependency on CUDA RT that was accidentally introduced indeed and the build was failing without since from debugging it was still adding the .cu files to the sources.

@gedoensmax
Copy link
Contributor Author

I managed to use clang to compile all CUDA code, but I did not manage to compile without CUDA toolkit present since a lot to device intrinsics are not defined in the ffmpeg cuda_runtime headers. NVCC compilation is much more reliable and gives a better user experience since no PTX compilation at runtime is needed.

I also updated the Dockerfiles so that they will work as shown below, to explicitly disable gpu one can also use the --gpumask now.

(.venv) maximilianm@ub24-maximilianm:~/vmaf$ docker run -it --rm -v /mnt/share/vmaf/data/:/data vmaf  -r /data/reference_1080p_yuv420p.yuv -d /data/distorted_1080p_yuv420p.yuv -w 1920 -h 1080 -b 8 -p 420
VMAF version 3.0.0
libvmaf ERROR problem during CUDA initialization
problem during vmaf_cuda_state_init, using CPU
128 frames ⠋⠉ 20.35 FPS
vmaf_v0.6.1: 99.867883
(.venv) maximilianm@ub24-maximilianm:~/vmaf$ docker run --gpus all -it --rm -v /mnt/share/vmaf/data/:/data vmaf  -r /data/reference_1080p_yuv420p.yuv -d /data/distorted_1080p_yuv420p.yuv -w 1920 -h 1080 -b 8 -p 420
VMAF version 3.0.0
128 frames ⠋⠉ 156.23 FPS
vmaf_v0.6.1: 99.867883

@BtbN
Copy link

BtbN commented Sep 17, 2025

The main advantage of using clang to compile to PTX code is that you can build for a really old SM if you don't need any more modern features, and the resulting PTX code will work with a wide range of drivers. While when using latest nvcc, it can only compile for sm75 and up, locking out all GPUs older than RTX2000 series, even though older ones could easily run the kernels.

@gedoensmax
Copy link
Contributor Author

@BtbN I get the reasoning behind clang which is why I made it easy to switch to clang compilation. Due to PTX compilation time at startup I decided to keep the default on NVCC. Especially inside a container that is an issue otherwise.

@kylophone
Copy link
Collaborator

@gedoensmax If this is ready to merge, please squash and I can push.

@gedoensmax
Copy link
Contributor Author

All squashed and ready to be merged.

@gedoensmax gedoensmax force-pushed the nv_codec_headers branch 2 times, most recently from ab24ad7 to ee7b952 Compare September 25, 2025 09:36
@gedoensmax
Copy link
Contributor Author

Sorry for the few force pushes - I introduced a problem with nvtx compile. It is now possible to compile with nvtx enabled but cuda disabled. That could help with CPU time measuring for example or for mixed CUDA and CPU profiling.

@kylophone
Copy link
Collaborator

@gedoensmax Let me know when this is ready. No rush, I will wait for your ping.

@gedoensmax
Copy link
Contributor Author

@kylophone It is ready I just noticed some minor things on the same day that I fixed. Therefore the force pushes, otherwise I am happy to merge this.

… clang compilation

Details:
- updated docker docs to no longer require separate CUDA container
- enable CUDA compilation using clang (CUDA Toolkit libs and headers still required)
- use nv-codec-headers for runtime loading (thanks @BtbN)
- remove redundant CUDA event recreation
@kylophone kylophone merged commit 1b08bb4 into Netflix:master Oct 20, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants