-
Notifications
You must be signed in to change notification settings - Fork 86
Extend color range note to explain limited range #1154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||
|---|---|---|---|---|
|
|
@@ -57,8 +57,7 @@ void initializeCudaContextWithPytorch(const torch::Device& device) { | |||
| // Color space and color range are independent concepts, so we can have a BT.709 | ||||
| // with full range, and another one with limited range. Same for BT.601. | ||||
| // | ||||
| // In the first version of this note we'll focus on the full color range. It | ||||
| // will later be updated to account for the limited range. | ||||
| // First, we'll consider the conversion in the full color range. | ||||
| // | ||||
| // Color conversion matrix | ||||
| // ----------------------- | ||||
|
|
@@ -110,6 +109,50 @@ void initializeCudaContextWithPytorch(const torch::Device& device) { | |||
| // | ||||
| // Which matches https://en.wikipedia.org/wiki/YCbCr#ITU-R_BT.709_conversion | ||||
| // | ||||
| // | ||||
| // Next, lets consider encoding RGB -> YUV in the limited color range. | ||||
| // Y is in the range [16-235] and U,V are in [16-240]. | ||||
| // The reduced range provides margins for errors during processing. | ||||
| // https://en.wikipedia.org/wiki/YCbCr#Y%E2%80%B2PbPr_to_Y%E2%80%B2CbCr | ||||
| // | ||||
| // To encode RGB -> YUV in limited range, we start with the full range conversion | ||||
| // matrix derived above, then scale it to compress into the limited ranges: | ||||
| // - RGB [0,255] -> Y [16,235]: compress by (235-16)/255 ≈ 219/255 | ||||
| // - RGB [0,255] -> U,V [16,240]: compress by (240-16)/255 ≈ 224/255 | ||||
|
Comment on lines
+120
to
+121
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this meant to be "≈" or should it be "=" ? |
||||
| // | ||||
| // ```py | ||||
| // import torch | ||||
| // kr, kg, kb = 0.2126, 0.7152, 0.0722 # BT.709 luma coefficients | ||||
| // u_scale = 2 * (1 - kb) | ||||
| // v_scale = 2 * (1 - kr) | ||||
| // | ||||
| // rgb_to_yuv_full = torch.tensor([ | ||||
| // [kr, kg, kb], | ||||
| // [-kr/u_scale, -kg/u_scale, (1-kb)/u_scale], | ||||
| // [(1-kr)/v_scale, -kg/v_scale, -kb/v_scale] | ||||
| // ]) | ||||
| // | ||||
| // full_to_limited_y_scale = 219.0 / 255.0 | ||||
| // full_to_limited_uv_scale = 224.0 / 255.0 | ||||
| // | ||||
| // rgb_to_yuv_limited = rgb_to_yuv_full * torch.tensor([ | ||||
| // [full_to_limited_y_scale], | ||||
| // [full_to_limited_uv_scale], | ||||
| // [full_to_limited_uv_scale] | ||||
| // ]) | ||||
| // | ||||
| // print("RGB->YUV matrix (Limited Range BT.709):") | ||||
| // print(rgb_to_yuv_limited) | ||||
| // ``` | ||||
| // | ||||
| // This yields: | ||||
| // tensor([[ 0.1826, 0.6142, 0.0620], | ||||
| // [-0.1006, -0.3386, 0.4392], | ||||
| // [ 0.4392, -0.3989, -0.0403]]) | ||||
|
Comment on lines
+148
to
+151
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ideally, could we find an authoritative source from which we can validate that these values are correct? Something similar to what we have for the full range:
It doesn't have to be for 709 specifically, but validating it for one limited range config would be great.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll try to find a source that gets our exact values. There is one section on that wiki page that gets close, but results in different values due to an optimization: https://en.wikipedia.org/wiki/YCbCr#Approximate_8-bit_matrices_for_BT.601 |
||||
| // | ||||
| // Which is the matrix we store in CudaDeviceInterface. | ||||
| // TODO: land PR that adds this matrix in CudaDeviceInterface | ||||
| // | ||||
| // Color conversion in NPP | ||||
| // ----------------------- | ||||
| // https://docs.nvidia.com/cuda/npp/image_color_conversion.html. | ||||
|
|
@@ -137,16 +180,16 @@ void initializeCudaContextWithPytorch(const torch::Device& device) { | |||
| // the decoder. | ||||
| // - But *internally*, `nppiNV12ToRGB_8u_ColorTwist32f_P2C3R_Ctx` needs U and V to | ||||
| // be centered around 0, i.e. in [-128, 127]. So we need to apply a -128 | ||||
| // offset to U and V. Y doesn't need to be offset. The offset can be applied | ||||
| // by adding a 4th column to the matrix. | ||||
| // offset to U and V. Y needs an offset of -16, only when using limited range. | ||||
| // The offsets can be applied by adding a 4th column to the matrix. | ||||
| // | ||||
| // | ||||
| // So our conversion matrix becomes the following, with new offset column: | ||||
| // tensor([[ 1.0000e+00, -3.3142e-09, 1.5748e+00, 0] | ||||
| // [ 1.0000e+00, -1.8732e-01, -4.6812e-01, -128] | ||||
| // [ 1.0000e+00, 1.8556e+00, 4.6231e-09 , -128]]) | ||||
| // | ||||
| // And that's what we need to pass for BT701, full range. | ||||
| // And that's what we need to pass for BT709, full range. | ||||
| /* clang-format on */ | ||||
|
|
||||
| // BT.709 full range color conversion matrix for YUV to RGB conversion. | ||||
|
|
||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading this might be slightly confusing because the first part of the note above started with
And that's also how the matrices below are defined. I think we can get around this by saying something like: