Transition VMAF to use nv-codec-headers instead of driver linking by gedoensmax · Pull Request #1436 · Netflix/vmaf

gedoensmax · 2025-09-02T15:40:55Z

This still requires NVCC to be installed for now but it would at least enable shipping a CUDA prebuilt library with ffmpeg as default I believe as the CUDA driver is dynamically loaded.

@kylophone Would it be possible to build this setup in the current CI ?

@BtbN to dynamically load the cuda driver I relied on your nv-codec-headers, but these are missing some driver functions we have been using in VMAF. Would it be possible to add these ?

cuCtxSynchronize
cuCtxGetStreamPriorityRange
cuStreamCreateWithPriority
cuMemHostAlloc
cuMemFreeHost
cuMemAllocAsync // not used yet, but it's counter part
cuMemFreeAsync
cuLaunchHostFunc

While there is no graph usage so far it would be great to have CUDA graph support as well, but these seem to be quite a few functions.

BtbN · 2025-09-02T15:54:13Z

You can PR them to https://code.ffmpeg.org/FFmpeg/nv-codec-headers if you like, adding new driver functions is never a huge deal.

You can most likely also get rid of the nvcc dependency by just using clang, like FFmpeg does. Practically every distribution of clang comes with nvptx support. You'll need to build a small header that defines all the interfaces you use though, but usually it's not all that many.

gedoensmax · 2025-09-02T16:13:56Z

@BtbN would it be possible for you to help me with these changes ? For me this contribution has to go through legal etc., I am working with the Aachen DevTech team :)

BtbN · 2025-09-02T16:18:04Z

Something I just noticed: You are still using cudart, aren't you?
That in itself will always pull in CUDA directly, I think, so you dynamically loading the driver API does not remove the runtime dependency. Or am I missing something there?

gedoensmax · 2025-09-02T16:34:40Z

I am only querying for cudart to get the path to nvcc. But yes you are right I forgot to delete -lcuda that being said it was not used anyway.
ldd at least tells me there is no CUDA required. But I have not tested on a CPU machine yet.

BtbN · 2025-09-02T16:40:38Z

Interesting, I'd have thought that adding cuda_rt_api_dependency to cuda_dependency, which ultimately ends up in the dependencies of the lib, would link against it. Might have misread the meson script then.

gedoensmax · 2025-09-02T16:50:47Z

I do not understand meson very well :D I just relied on ldd to be honest.

kylophone · 2025-09-02T18:15:59Z

@BtbN

You can PR them to https://code.ffmpeg.org/FFmpeg/nv-codec-headers if you like, adding new driver functions is never a huge deal.

Made a PR here: https://code.ffmpeg.org/FFmpeg/nv-codec-headers/pulls/1. Let's review over on Forgejo.

@gedoensmax Could you please test your libvmaf changes with the updated nv-codec-headers and let me know if everything is working?

gedoensmax · 2025-09-03T14:50:30Z

@kylophone I adopted the remaining functions from your patch - thanks for the quick turnaround.

@BtbN Also thanks for noting that there was still some dependency on CUDA RT that was accidentally introduced indeed and the build was failing without since from debugging it was still adding the .cu files to the sources.

gedoensmax · 2025-09-17T13:54:07Z

I managed to use clang to compile all CUDA code, but I did not manage to compile without CUDA toolkit present since a lot to device intrinsics are not defined in the ffmpeg cuda_runtime headers. NVCC compilation is much more reliable and gives a better user experience since no PTX compilation at runtime is needed.

I also updated the Dockerfiles so that they will work as shown below, to explicitly disable gpu one can also use the --gpumask now.

(.venv) maximilianm@ub24-maximilianm:~/vmaf$ docker run -it --rm -v /mnt/share/vmaf/data/:/data vmaf  -r /data/reference_1080p_yuv420p.yuv -d /data/distorted_1080p_yuv420p.yuv -w 1920 -h 1080 -b 8 -p 420
VMAF version 3.0.0
libvmaf ERROR problem during CUDA initialization
problem during vmaf_cuda_state_init, using CPU
128 frames ⠋⠉ 20.35 FPS
vmaf_v0.6.1: 99.867883
(.venv) maximilianm@ub24-maximilianm:~/vmaf$ docker run --gpus all -it --rm -v /mnt/share/vmaf/data/:/data vmaf  -r /data/reference_1080p_yuv420p.yuv -d /data/distorted_1080p_yuv420p.yuv -w 1920 -h 1080 -b 8 -p 420
VMAF version 3.0.0
128 frames ⠋⠉ 156.23 FPS
vmaf_v0.6.1: 99.867883

BtbN · 2025-09-17T14:02:03Z

The main advantage of using clang to compile to PTX code is that you can build for a really old SM if you don't need any more modern features, and the resulting PTX code will work with a wide range of drivers. While when using latest nvcc, it can only compile for sm75 and up, locking out all GPUs older than RTX2000 series, even though older ones could easily run the kernels.

libvmaf/src/cuda/cuda_helper.cuh

gedoensmax · 2025-09-17T14:49:50Z

@BtbN I get the reasoning behind clang which is why I made it easy to switch to clang compilation. Due to PTX compilation time at startup I decided to keep the default on NVCC. Especially inside a container that is an issue otherwise.

kylophone · 2025-09-24T16:59:51Z

@gedoensmax If this is ready to merge, please squash and I can push.

gedoensmax · 2025-09-25T08:24:07Z

All squashed and ready to be merged.

gedoensmax · 2025-09-25T09:43:18Z

Sorry for the few force pushes - I introduced a problem with nvtx compile. It is now possible to compile with nvtx enabled but cuda disabled. That could help with CPU time measuring for example or for mixed CUDA and CPU profiling.

kylophone · 2025-10-06T18:18:20Z

@gedoensmax Let me know when this is ready. No rush, I will wait for your ping.

gedoensmax · 2025-10-15T08:38:16Z

@kylophone It is ready I just noticed some minor things on the same day that I fixed. Therefore the force pushes, otherwise I am happy to merge this.

@BtbN

… clang compilation Details: - updated docker docs to no longer require separate CUDA container - enable CUDA compilation using clang (CUDA Toolkit libs and headers still required) - use nv-codec-headers for runtime loading (thanks @BtbN) - remove redundant CUDA event recreation

gedoensmax changed the title ~~Transition vamf to use nv-codec-headers instead of driver linking~~ Transition VMAF to use nv-codec-headers instead of driver linking Sep 2, 2025

BtbN reviewed Sep 17, 2025

View reviewed changes

libvmaf/src/cuda/cuda_helper.cuh Outdated Show resolved Hide resolved

gedoensmax force-pushed the nv_codec_headers branch from 6acf6d2 to 565ac41 Compare September 25, 2025 08:23

gedoensmax force-pushed the nv_codec_headers branch 2 times, most recently from ab24ad7 to ee7b952 Compare September 25, 2025 09:36

gedoensmax force-pushed the nv_codec_headers branch from ee7b952 to 7a016d6 Compare September 25, 2025 17:11

kylophone force-pushed the nv_codec_headers branch from 685124b to 9b0ad54 Compare October 20, 2025 21:24

kylophone merged commit 1b08bb4 into Netflix:master Oct 20, 2025
9 checks passed

Conversation

gedoensmax commented Sep 2, 2025

Uh oh!

BtbN commented Sep 2, 2025

Uh oh!

gedoensmax commented Sep 2, 2025

Uh oh!

BtbN commented Sep 2, 2025

Uh oh!

gedoensmax commented Sep 2, 2025

Uh oh!

BtbN commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gedoensmax commented Sep 2, 2025

Uh oh!

kylophone commented Sep 2, 2025

Uh oh!

gedoensmax commented Sep 3, 2025

Uh oh!

gedoensmax commented Sep 17, 2025

Uh oh!

BtbN commented Sep 17, 2025

Uh oh!

Uh oh!

gedoensmax commented Sep 17, 2025

Uh oh!

kylophone commented Sep 24, 2025

Uh oh!

gedoensmax commented Sep 25, 2025

Uh oh!

gedoensmax commented Sep 25, 2025

Uh oh!

kylophone commented Oct 6, 2025

Uh oh!

gedoensmax commented Oct 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

BtbN commented Sep 2, 2025 •

edited

Loading