Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 48 additions & 5 deletions src/torchcodec/_core/CUDACommon.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -57,8 +57,7 @@ void initializeCudaContextWithPytorch(const torch::Device& device) {
// Color space and color range are independent concepts, so we can have a BT.709
// with full range, and another one with limited range. Same for BT.601.
//
// In the first version of this note we'll focus on the full color range. It
// will later be updated to account for the limited range.
// First, we'll consider the conversion in the full color range.
//
// Color conversion matrix
// -----------------------
Expand Down Expand Up @@ -110,6 +109,50 @@ void initializeCudaContextWithPytorch(const torch::Device& device) {
//
// Which matches https://en.wikipedia.org/wiki/YCbCr#ITU-R_BT.709_conversion
//
//
// Next, lets consider encoding RGB -> YUV in the limited color range.
// Y is in the range [16-235] and U,V are in [16-240].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading this might be slightly confusing because the first part of the note above started with

Conventionally Y is in [0, 1] range, and U and V are in [-0.5, 0.5]

And that's also how the matrices below are defined. I think we can get around this by saying something like:

In limited range, the [0, 255] range is mapped into [16-235] for Y, and into [16-240] for U and V.

// The reduced range provides margins for errors during processing.
// https://en.wikipedia.org/wiki/YCbCr#Y%E2%80%B2PbPr_to_Y%E2%80%B2CbCr
//
// To encode RGB -> YUV in limited range, we start with the full range conversion
// matrix derived above, then scale it to compress into the limited ranges:
// - RGB [0,255] -> Y [16,235]: compress by (235-16)/255 ≈ 219/255
// - RGB [0,255] -> U,V [16,240]: compress by (240-16)/255 ≈ 224/255
Comment on lines +120 to +121
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this meant to be "≈" or should it be "=" ?

//
// ```py
// import torch
// kr, kg, kb = 0.2126, 0.7152, 0.0722 # BT.709 luma coefficients
// u_scale = 2 * (1 - kb)
// v_scale = 2 * (1 - kr)
//
// rgb_to_yuv_full = torch.tensor([
// [kr, kg, kb],
// [-kr/u_scale, -kg/u_scale, (1-kb)/u_scale],
// [(1-kr)/v_scale, -kg/v_scale, -kb/v_scale]
// ])
//
// full_to_limited_y_scale = 219.0 / 255.0
// full_to_limited_uv_scale = 224.0 / 255.0
//
// rgb_to_yuv_limited = rgb_to_yuv_full * torch.tensor([
// [full_to_limited_y_scale],
// [full_to_limited_uv_scale],
// [full_to_limited_uv_scale]
// ])
//
// print("RGB->YUV matrix (Limited Range BT.709):")
// print(rgb_to_yuv_limited)
// ```
//
// This yields:
// tensor([[ 0.1826, 0.6142, 0.0620],
// [-0.1006, -0.3386, 0.4392],
// [ 0.4392, -0.3989, -0.0403]])
Comment on lines +148 to +151
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, could we find an authoritative source from which we can validate that these values are correct? Something similar to what we have for the full range:

// Which matches https://en.wikipedia.org/wiki/YCbCr#ITU-R_BT.709_conversion

It doesn't have to be for 709 specifically, but validating it for one limited range config would be great.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to find a source that gets our exact values. There is one section on that wiki page that gets close, but results in different values due to an optimization: https://en.wikipedia.org/wiki/YCbCr#Approximate_8-bit_matrices_for_BT.601

//
// Which is the matrix we store in CudaDeviceInterface.
// TODO: land PR that adds this matrix in CudaDeviceInterface
//
// Color conversion in NPP
// -----------------------
// https://docs.nvidia.com/cuda/npp/image_color_conversion.html.
Expand Down Expand Up @@ -137,16 +180,16 @@ void initializeCudaContextWithPytorch(const torch::Device& device) {
// the decoder.
// - But *internally*, `nppiNV12ToRGB_8u_ColorTwist32f_P2C3R_Ctx` needs U and V to
// be centered around 0, i.e. in [-128, 127]. So we need to apply a -128
// offset to U and V. Y doesn't need to be offset. The offset can be applied
// by adding a 4th column to the matrix.
// offset to U and V. Y needs an offset of -16, only when using limited range.
// The offsets can be applied by adding a 4th column to the matrix.
//
//
// So our conversion matrix becomes the following, with new offset column:
// tensor([[ 1.0000e+00, -3.3142e-09, 1.5748e+00, 0]
// [ 1.0000e+00, -1.8732e-01, -4.6812e-01, -128]
// [ 1.0000e+00, 1.8556e+00, 4.6231e-09 , -128]])
//
// And that's what we need to pass for BT701, full range.
// And that's what we need to pass for BT709, full range.
/* clang-format on */

// BT.709 full range color conversion matrix for YUV to RGB conversion.
Expand Down
Loading