-
Notifications
You must be signed in to change notification settings - Fork 367
Open
Description
Which component impacted?
Video Processing
Is it regression? Good in old configuration?
None
What happened?
On:
- Battle Image G21 (
0xe20b
) - Ubuntu 25.04, kernel
6.14.0-28-generic
- Driver stack installed per https://dgpu-docs.intel.com/driver/client/overview.html instruction from Kobuk team PPA repository, versions:
- intel-media-va-driver-non-free:
25.3.2-0ubuntu1~25.04~ppa2
- intel-media-va-driver-non-free:
- FFmpeg n6.1.2
Usage scenario:
- Video decoding (say, h264 with NV12 output) + color space conversion to RGB24. The reason behind RGB24 is that this is a format used by default in AI training, fine tuning and inference. Specifically, that's a format used in a https://github.com/pytorch/torchcodec.
- As Intel GPUs currently does not support RGB24, we perform conversion to one of supported RGB32 color format then just copy relevant data. The issue described below does not actually depend on that, but we give examples on RGB24 as this format allows to compare results with CUDA.
Results:
- Average PSNRs comparing to CPU processed reference:
Backend | Pipeline | Avg. PSNR |
---|---|---|
CUDA | torchcodec (ffmpeg-nvdec + self-written NPP color conversion) | 50.872619 |
QSV | ffmpeg (dec + scale_qsv) | 44.622594 |
VAAPI | torchcodec PR-558 (ffmpeg-vaapi dec + self-written VAAPI color conversion) | 44.622594 |
VAAPI | ffmpeg-vaapi (dec + scale_vaapi ) or torchcodec PR-832 |
24.701474 |
-
Overall there are visible differences between Intel and CPU/CUDA results:
- For the ffmpeg-vaapi
scale_vaapi
case there is visible color difference. Current assumption is that it's due to wrong color standard settings (see Analysis section below for details) - For the ffmpeg-qsv case there is visible differences in object positions (for the nasa clip, that's well seen comparing frames at indexes around 199)
- For the ffmpeg-vaapi
-
NOTE: For Intel GPUs torchcodec needs to be patched. There are 2 patch versions:
- Enable Intel GPU support in torchcodec on Linux (xpu device) meta-pytorch/torchcodec#558 - that's ffmpeg-vaapi decoding + direct vaapi color conversion
- Full (dec+filters) ffmpeg-vaapi pipeline for Intel GPU (xpu pytorch backend) meta-pytorch/torchcodec#832 - that's ffmpeg-vaapi for decoding + conversion (with ffmpeg-vaapi filters)
-
For more details see "Detailed results" section below
Expectation:
- Intel pipeline quality to be on par with CUDA (as of now best case scenarion is 8 dB behind)
- ffmpeg-vaapi pipeline quality to be on par with ffmpeg-qsv (as of now 16 dB behind)
What's the usage scenario when you are seeing the problem?
Video Analytics
What impacted?
No response
Debug Information
Analysis:
- Self-written VAAPI color conversion sets
VAProcPipelineParameterBuffer::surface_color_standard
as VAProcColorStandardBT709, other values of VAProcPipelineParameterBuffer (except widht/heigh) were zeroed - As experiment, setting
VAProcPipelineParameterBuffer::output_color_standard
toVAProcColorStandardBT601
does not change output stream - I did not check how ffmpeg-qsv and libvpl sets these parameters, but ffmpeg-qsv quality fully matches self-written VAAPI color conversion pipeline
- ffmpeg-vaapi
scale_vaapi
filter setsVAProcColorStandardExplicit
for bothVAProcPipelineParameterBuffer::surface_color_standard
andVAProcPipelineParameterBuffer::output_color_standard
+ respectiveinput_color_properties
andoutput_color_properties
. These settings seem to be correct on ffmpeg side, but give significantly lower quality - As an experiment, changing
VAProcPipelineParameterBuffer::surface_color_standard
orVAProcPipelineParameterBuffer::output_color_standard
"fixes"scale_vaapi
filter which matches quality level of ffmpeg-qsv or self-written VAAPI conversion
Detailed results:
- Below are sha1 checksums and PSNR values for a number of cases for above scenario: CPU processing, CUDA processing (on A10), ffmpeg VAAPI and ffmpeg QSV.
- Input stream can be found in torchcodec repo: https://github.com/pytorch/torchcodec/blob/main/test/resources/nasa_13013.mp4
dump.py
script is a small script which uses torchcodec to dump processed videos. It can be found in https://github.com/dvrogozh/notebook/blob/master/pytorch/run-torchcodec-on-intel-gpu.md- Torchcodec results are given to show matching with ffmpeg cmdlines + to give CUDA reference
- CPU ffmpeg output is the reference for all PSNR calculations
- Cmdlines:
python3 dump.py -i test/resources/nasa_13013.mp4 -o nasa_13013_torchcodec_cpu.rgb -d cpu -s 0:390
python3 dump.py -i test/resources/nasa_13013.mp4 -o nasa_13013_torchcodec_cuda.rgb -d cuda:0 -s 0:390
python3 dump.py -i test/resources/nasa_13013.mp4 -o nasa_13013_torchcodec_cuda.rgb -d xpu:0 -s 0:390
ffmpeg -i test/resources/nasa_13013.mp4 -vf "scale=480:270:sws_flags=bilinear,format=rgb24" -y nasa_13013_ffmpeg_cpu.rgb
ffmpeg -i test/resources/nasa_13013.mp4 -vf "format=rgb24" -y nasa_13013_ffmpeg_cpu2.rgb
ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD129 -hwaccel_output_format vaapi -i test/resources/nasa_13013.mp4 -vf "scale_vaapi=480:270:format=rgba,hwdownload,format=rgba,format=rgb24" -y nasa_13013_ffmpeg_vaapi.rgb
ffmpeg -hwaccel qsv -qsv_device /dev/dri/renderD129 -c:v h264_qsv -i test/resources/nasa_13013.mp4 -vf "scale_qsv=480:270:format=rgb32,hwdownload,format=rgb32,format=rgb24" -y nasa_13013_ffmpeg_qsv.rgb
- NOTE: results marked as "patched ffmpeg-vaapi" correspond to the ffmpeg-vaapi modification to change input/output color standard at this place: https://github.com/FFmpeg/FFmpeg/blob/b1a4534186ca51b0457579fc05a5739eb2cc45cd/libavfilter/vaapi_vpp.c#L497
- sha1sums:
$ sha1sum *.rgb
7d307c4cfcf2680e413c943646894499aa641b2e nasa_13013_ffmpeg_cpu.rgb # n6.1.2
7d307c4cfcf2680e413c943646894499aa641b2e nasa_13013_ffmpeg_cpu2.rgb # n6.1.2
6dcf7083da51717e06902b81dcef11adaa7ae071 nasa_13013_torchcodec_cuda.rgb
718e35799ebfdd78d91914b474748cbb3095636b nasa_13013_ffmpeg_qsv.rgb # n6.1.2
718e35799ebfdd78d91914b474748cbb3095636b nasa_13013_ffmpeg_vaapi_in=BT709_out=BT709.rgb # patched ffmpeg-vaapi
718e35799ebfdd78d91914b474748cbb3095636b nasa_13013_ffmpeg_vaapi_out=BT601.rgb # patched ffmpeg-vaapi
718e35799ebfdd78d91914b474748cbb3095636b nasa_13013_ffmpeg_vaapi_out=BT709.rgb # patched ffmpeg-vaapi
7d307c4cfcf2680e413c943646894499aa641b2e nasa_13013_torchcodec_cpu.rgb
6dcf7083da51717e06902b81dcef11adaa7ae071 nasa_13013_torchcodec_cuda.rgb
2e036b96d4b10501f3b3b30b9d3e3313214fc6bc nasa_13013_ffmpeg_vaapi.rgb # n6.1.2
2e036b96d4b10501f3b3b30b9d3e3313214fc6bc nasa_13013_torchcodec_ffmpeg_vaapi_filters.rgb
718e35799ebfdd78d91914b474748cbb3095636b nasa_13013_torchcodec_vaapi.rgb
- PNSR values:
ffmpeg -f rawvideo -pix_fmt rgb24 -s:v 480x270 -i nasa_13013_torchcodec_cuda.rgb -f rawvideo -pix_fmt rgb24 -s:v 480x270 -i nasa_13013_ffmpeg_cpu.rgb -filter_complex "psnr" -f null /dev/null
[Parsed_psnr_0 @ 0x561f70f6d2c0] PSNR r:53.017530 g:48.658710 b:52.270208 average:50.872619 min:49.593809 max:51.807851
ffmpeg -f rawvideo -pix_fmt rgb24 -s:v 480x270 -i nasa_13013_torchcodec_ffmpeg_vaapi_filters.rgb -f rawvideo -pix_fmt rgb24 -s:v 480x270 -i nasa_13013_ffmpeg_cpu.rgb -filter_complex "psnr" -f null /dev/null
[Parsed_psnr_0 @ 0x5588df31d180] PSNR r:24.489627 g:24.480980 b:25.169051 average:24.701474 min:24.309702 max:26.582759
ffmpeg -f rawvideo -pix_fmt rgb24 -s:v 480x270 -i nasa_13013_ffmpeg_vaapi.rgb -f rawvideo -pix_fmt rgb24 -s:v 480x270 -i nasa_13013_ffmpeg_cpu.rgb -filter_complex "psnr" -f null /dev/null
[Parsed_psnr_0 @ 0x64513338d8c0] PSNR r:24.489627 g:24.480980 b:25.169051 average:24.701474 min:24.309702 max:26.582759
ffmpeg -f rawvideo -pix_fmt rgb24 -s:v 480x270 -i nasa_13013_torchcodec_vaapi.rgb -f rawvideo -pix_fmt rgb24 -s:v 480x270 -i nasa_13013_ffmpeg_cpu.rgb -filter_complex "psnr" -f null /dev/null
[Parsed_psnr_0 @ 0x56d294e56940] PSNR r:40.279087 g:44.622594 b:41.263727 average:41.695772 min:39.498336 max:43.352288
ffmpeg -f rawvideo -pix_fmt rgb24 -s:v 480x270 -i nasa_13013_ffmpeg_qsv.rgb -f rawvideo -pix_fmt rgb24 -s:v 480x270 -i nasa_13013_ffmpeg_cpu.rgb -filter_complex "psnr" -f null /dev/null
[Parsed_psnr_0 @ 0x61a5d080b940] PSNR r:40.279087 g:44.622594 b:41.263727 average:41.695772 min:39.498336 max:43.352288
Do you want to contribute a patch to fix the issue?
None
Metadata
Metadata
Assignees
Labels
No labels