[Bug]: Low PSNR (~8 dB in best case/~25 dB in worst case below cuda) on NV12->RGB color conversion output

### Which component impacted?

Video Processing

### Is it regression? Good in old configuration?

None

### What happened?

On:
* Battle Image G21 (`0xe20b`)
* Ubuntu 25.04, kernel `6.14.0-28-generic`
* Driver stack installed per https://dgpu-docs.intel.com/driver/client/overview.html instruction from Kobuk team PPA repository, versions:
  * intel-media-va-driver-non-free: `25.3.2-0ubuntu1~25.04~ppa2`
* FFmpeg n6.1.2

Usage scenario:
* Video decoding (say, h264 with NV12 output) + color space conversion to RGB24. The reason behind RGB24 is that this is a format used by default in AI training, fine tuning and inference. Specifically, that's a format used in a https://github.com/pytorch/torchcodec.
* As Intel GPUs currently does not support RGB24, we perform conversion to one of supported RGB32 color format then just copy relevant data. The issue described below does not actually depend on that, but we give examples on RGB24 as this format allows to compare results with CUDA.

Results:
* Average PSNRs comparing to CPU processed reference:

| Backend | Pipeline | Avg. PSNR |
| --- | --- | --- |
| CUDA | torchcodec (ffmpeg-nvdec + self-written NPP color conversion) | 50.872619 |
| QSV  | ffmpeg (dec + scale_qsv) | 44.622594 |
| VAAPI | torchcodec PR-558 (ffmpeg-vaapi dec + self-written VAAPI color conversion) | 44.622594 |
| VAAPI | ffmpeg-vaapi (dec + `scale_vaapi`) or torchcodec PR-832 | 24.701474 |

* Overall there are **visible differences** between Intel and CPU/CUDA results:
  * For the ffmpeg-vaapi `scale_vaapi` case there is visible color difference. Current assumption is that it's due to wrong color standard settings (see Analysis section below for details)
  * For the ffmpeg-qsv case there is visible differences in object positions (for the nasa clip, that's well seen comparing frames at indexes around 199)

* NOTE: For Intel GPUs torchcodec needs to be patched. There are 2 patch versions:
  * https://github.com/pytorch/torchcodec/pull/558 - that's ffmpeg-vaapi decoding + direct vaapi color conversion
  * https://github.com/pytorch/torchcodec/pull/832 - that's ffmpeg-vaapi for decoding + conversion (with ffmpeg-vaapi filters)

* For more details see "Detailed results" section below

**Expectation:**
1. Intel pipeline quality to be on par with CUDA (as of now best case scenarion is 8 dB behind)
1. ffmpeg-vaapi pipeline quality to be on par with ffmpeg-qsv (as of now 16 dB behind)


### What's the usage scenario when you are seeing the problem?

Video Analytics

### What impacted?

_No response_

### Debug Information

Analysis:
* Self-written VAAPI color conversion sets `VAProcPipelineParameterBuffer::surface_color_standard` as VAProcColorStandardBT709, other values of VAProcPipelineParameterBuffer (except widht/heigh) were zeroed
* As experiment, setting `VAProcPipelineParameterBuffer::output_color_standard` to `VAProcColorStandardBT601` does not change output stream
* I did not check how ffmpeg-qsv and libvpl sets these parameters, but ffmpeg-qsv quality fully matches self-written VAAPI color conversion pipeline
* ffmpeg-vaapi `scale_vaapi` filter sets `VAProcColorStandardExplicit` for both `VAProcPipelineParameterBuffer::surface_color_standard` and `VAProcPipelineParameterBuffer::output_color_standard` + respective `input_color_properties` and `output_color_properties`. These settings seem to be correct on ffmpeg side, but give significantly lower quality
* As an experiment, changing `VAProcPipelineParameterBuffer::surface_color_standard` or `VAProcPipelineParameterBuffer::output_color_standard` "fixes" `scale_vaapi` filter which matches quality level of ffmpeg-qsv or self-written VAAPI conversion


Detailed results:
* Below are sha1 checksums and PSNR values for a number of cases for above scenario: CPU processing, CUDA processing (on A10), ffmpeg VAAPI and ffmpeg QSV.
* Input stream can be found in torchcodec repo: https://github.com/pytorch/torchcodec/blob/main/test/resources/nasa_13013.mp4
* `dump.py` script is a small script which uses torchcodec to dump processed videos. It can be found in https://github.com/dvrogozh/notebook/blob/master/pytorch/run-torchcodec-on-intel-gpu.md
* Torchcodec results are given to show matching with ffmpeg cmdlines + to give CUDA reference
* **CPU ffmpeg output is the reference for all PSNR calculations**
* Cmdlines:
```
python3 dump.py -i test/resources/nasa_13013.mp4 -o nasa_13013_torchcodec_cpu.rgb -d cpu -s 0:390
python3 dump.py -i test/resources/nasa_13013.mp4 -o nasa_13013_torchcodec_cuda.rgb -d cuda:0 -s 0:390
python3 dump.py -i test/resources/nasa_13013.mp4 -o nasa_13013_torchcodec_cuda.rgb -d xpu:0 -s 0:390

ffmpeg -i test/resources/nasa_13013.mp4 -vf "scale=480:270:sws_flags=bilinear,format=rgb24" -y nasa_13013_ffmpeg_cpu.rgb
ffmpeg -i test/resources/nasa_13013.mp4 -vf "format=rgb24" -y nasa_13013_ffmpeg_cpu2.rgb
ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD129 -hwaccel_output_format vaapi -i test/resources/nasa_13013.mp4 -vf "scale_vaapi=480:270:format=rgba,hwdownload,format=rgba,format=rgb24" -y nasa_13013_ffmpeg_vaapi.rgb
ffmpeg -hwaccel qsv -qsv_device /dev/dri/renderD129 -c:v h264_qsv -i test/resources/nasa_13013.mp4 -vf "scale_qsv=480:270:format=rgb32,hwdownload,format=rgb32,format=rgb24" -y nasa_13013_ffmpeg_qsv.rgb
```
* NOTE: results marked as "patched ffmpeg-vaapi" correspond to the ffmpeg-vaapi modification to change input/output color standard at this place: https://github.com/FFmpeg/FFmpeg/blob/b1a4534186ca51b0457579fc05a5739eb2cc45cd/libavfilter/vaapi_vpp.c#L497
* sha1sums:
```
$ sha1sum *.rgb
7d307c4cfcf2680e413c943646894499aa641b2e  nasa_13013_ffmpeg_cpu.rgb                       # n6.1.2
7d307c4cfcf2680e413c943646894499aa641b2e  nasa_13013_ffmpeg_cpu2.rgb                      # n6.1.2
6dcf7083da51717e06902b81dcef11adaa7ae071  nasa_13013_torchcodec_cuda.rgb
718e35799ebfdd78d91914b474748cbb3095636b  nasa_13013_ffmpeg_qsv.rgb                       # n6.1.2
718e35799ebfdd78d91914b474748cbb3095636b  nasa_13013_ffmpeg_vaapi_in=BT709_out=BT709.rgb  # patched ffmpeg-vaapi
718e35799ebfdd78d91914b474748cbb3095636b  nasa_13013_ffmpeg_vaapi_out=BT601.rgb           # patched ffmpeg-vaapi
718e35799ebfdd78d91914b474748cbb3095636b  nasa_13013_ffmpeg_vaapi_out=BT709.rgb           # patched ffmpeg-vaapi
7d307c4cfcf2680e413c943646894499aa641b2e  nasa_13013_torchcodec_cpu.rgb
6dcf7083da51717e06902b81dcef11adaa7ae071  nasa_13013_torchcodec_cuda.rgb
2e036b96d4b10501f3b3b30b9d3e3313214fc6bc  nasa_13013_ffmpeg_vaapi.rgb                     # n6.1.2
2e036b96d4b10501f3b3b30b9d3e3313214fc6bc  nasa_13013_torchcodec_ffmpeg_vaapi_filters.rgb
718e35799ebfdd78d91914b474748cbb3095636b  nasa_13013_torchcodec_vaapi.rgb
```
* PNSR values:
```
ffmpeg -f rawvideo -pix_fmt rgb24 -s:v 480x270 -i nasa_13013_torchcodec_cuda.rgb -f rawvideo -pix_fmt rgb24 -s:v 480x270 -i nasa_13013_ffmpeg_cpu.rgb -filter_complex "psnr" -f null /dev/null
[Parsed_psnr_0 @ 0x561f70f6d2c0] PSNR r:53.017530 g:48.658710 b:52.270208 average:50.872619 min:49.593809 max:51.807851

ffmpeg -f rawvideo -pix_fmt rgb24 -s:v 480x270 -i nasa_13013_torchcodec_ffmpeg_vaapi_filters.rgb -f rawvideo -pix_fmt rgb24 -s:v 480x270 -i nasa_13013_ffmpeg_cpu.rgb -filter_complex "psnr" -f null /dev/null
[Parsed_psnr_0 @ 0x5588df31d180] PSNR r:24.489627 g:24.480980 b:25.169051 average:24.701474 min:24.309702 max:26.582759

ffmpeg -f rawvideo -pix_fmt rgb24 -s:v 480x270 -i nasa_13013_ffmpeg_vaapi.rgb -f rawvideo -pix_fmt rgb24 -s:v 480x270 -i nasa_13013_ffmpeg_cpu.rgb -filter_complex "psnr" -f null /dev/null
[Parsed_psnr_0 @ 0x64513338d8c0] PSNR r:24.489627 g:24.480980 b:25.169051 average:24.701474 min:24.309702 max:26.582759

ffmpeg -f rawvideo -pix_fmt rgb24 -s:v 480x270 -i nasa_13013_torchcodec_vaapi.rgb -f rawvideo -pix_fmt rgb24 -s:v 480x270 -i nasa_13013_ffmpeg_cpu.rgb -filter_complex "psnr" -f null /dev/null
[Parsed_psnr_0 @ 0x56d294e56940] PSNR r:40.279087 g:44.622594 b:41.263727 average:41.695772 min:39.498336 max:43.352288

ffmpeg -f rawvideo -pix_fmt rgb24 -s:v 480x270 -i nasa_13013_ffmpeg_qsv.rgb -f rawvideo -pix_fmt rgb24 -s:v 480x270 -i nasa_13013_ffmpeg_cpu.rgb -filter_complex "psnr" -f null /dev/null
[Parsed_psnr_0 @ 0x61a5d080b940] PSNR r:40.279087 g:44.622594 b:41.263727 average:41.695772 min:39.498336 max:43.352288
```

### Do you want to contribute a patch to fix the issue?

None

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: Low PSNR (~8 dB in best case/~25 dB in worst case below cuda) on NV12->RGB color conversion output #1954

Which component impacted?

Is it regression? Good in old configuration?

What happened?

What's the usage scenario when you are seeing the problem?

What impacted?

Debug Information

Do you want to contribute a patch to fix the issue?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Backend	Pipeline	Avg. PSNR
CUDA	torchcodec (ffmpeg-nvdec + self-written NPP color conversion)	50.872619
QSV	ffmpeg (dec + scale_qsv)	44.622594
VAAPI	torchcodec PR-558 (ffmpeg-vaapi dec + self-written VAAPI color conversion)	44.622594
VAAPI	ffmpeg-vaapi (dec + `scale_vaapi`) or torchcodec PR-832	24.701474

[Bug]: Low PSNR (~8 dB in best case/~25 dB in worst case below cuda) on NV12->RGB color conversion output #1954

Description

Which component impacted?

Is it regression? Good in old configuration?

What happened?

What's the usage scenario when you are seeing the problem?

What impacted?

Debug Information

Do you want to contribute a patch to fix the issue?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions