-
Notifications
You must be signed in to change notification settings - Fork 111
Description
Describe what you are looking for
Feature request: Specialized Normalize — image, video, volume
Why this matters
This may look like a very narrow use case, but normalize is the single most widely used transform in all of computer vision augmentations. Almost every training pipeline applies it (ImageNet mean/std, YOLO /255, min-max, etc.). So performance here has outsized impact.
Summary
We request from NumKong a small set of specialized normalize operations. Normalization (subtract mean / divide by std, or min-max scale) is the most common preprocessing step in image and video pipelines.
- Work for image
(H, W, C), video(N, H, W, C)or(T, H, W, C), and volume(D, H, W, C)or(N, D, H, W, C). - Support external mean/std or min/max (YOLO, ImageNet, Inception, etc.) or compute them from the data (per-image or per-channel).
- Performance: We’d love the implementation to be competitive with or faster than (1) direct NumPy, (2) OpenCV’s specialized functions where they exist, and (3) the “build LUT + apply LUT” approach. The section below gives reference code to compare against.
(uint8 → float32 LUT is a separate request — see ashvardanian/StringZilla#302
How we do it now (Albucore)
AlbumentationsX calls Albucore for normalize. Current behavior https://github.com/albumentations-team/albucore/blob/main/albucore/functions.py#L336-L348 :
- Standard (external mean/std): We pass precomputed
mean_npanddenominator(i.e.meanand1/(std * max_pixel_value)in pixel space). Formula:out = (img - mean_np) * denominator→ float32. For uint8 input, Albucore builds a float32 LUT of length 256 and applies it (e.g. viacv2.LUT) to get float32 output in one pass instead of converting to float then doing the math. - Per-image / min-max:
normalize_per_image(img, mode)with modes like"min_max","image","image_per_channel","min_max_per_channel"— compute stats from the image then apply.
So today we have: (a) direct float32 path (numpy-like or fused), (b) uint8 path via LUT. Beating or matching these would make NumKong a natural choice for us.
Reference implementations to compare against
If you’re not deep in CV: the snippets below are reference implementations we use today (NumPy, OpenCV, LUT). Same input (e.g. uint8 image (H, W, 3) or float32), same output (float32). Timing these gives a baseline; we’d adopt NumKong if it’s competitive or faster.
1. Direct NumPy — standard (external mean/std)
import numpy as np
# Input: float32 (H, W, C); mean, std shape (C,) in [0,1]; max_pixel_value = 255
def normalize_numpy_standard(img: np.ndarray, mean: np.ndarray, std: np.ndarray, max_val: float = 255.0) -> np.ndarray:
x = img.astype(np.float32) / max_val
return (x - mean) / (std + 1e-7)2. Direct NumPy — min-max (global, to [0,1])
def normalize_numpy_minmax_global(img: np.ndarray) -> np.ndarray:
img_f = img.astype(np.float32)
mn, mx = img_f.min(), img_f.max()
return (img_f - mn) / ((mx - mn) + 1e-7)3. OpenCV — min-max (global)
OpenCV has a dedicated function for this; a natural comparison point.
import cv2
def normalize_cv2_minmax(img: np.ndarray) -> np.ndarray:
# img uint8 or float32; output float32 [0, 1]
return cv2.normalize(img, None, 0, 1, cv2.NORM_MINMAX, dtype=cv2.CV_32F)4. OpenCV — per-channel mean/std (compute then apply)
No single cv2 call for “subtract mean, divide by std”; we use cv2.meanStdDev then NumPy.
def normalize_cv2_meanstd_per_channel(img: np.ndarray) -> np.ndarray:
# img float32 (H, W, C)
mean, std = cv2.meanStdDev(img) # (C, 1) each
mean = mean.flatten()
std = std.flatten()
return (img - mean) / (std + 1e-7)5. LUT path (uint8 → float32) — standard
How Albucore does it for uint8: build a float32 LUT, then one LUT apply per channel (or fused). Another useful baseline.
import cv2
def build_lut_standard(mean: float, inv_std: float) -> np.ndarray:
# For one channel: out[i] = (i/255 - mean) * inv_std => (i/255 - mean) / std
lut = np.arange(256, dtype=np.float32) / 255.0
lut = (lut - mean) * inv_std
return lut
def normalize_uint8_via_lut(img: np.ndarray, mean: np.ndarray, std: np.ndarray) -> np.ndarray:
# img uint8 (H, W, C); mean, std (C,)
out = np.empty((*img.shape[:-1], img.shape[-1]), dtype=np.float32)
for c in range(img.shape[-1]):
lut = build_lut_standard(mean[c], 1.0 / (std[c] + 1e-7))
out[..., c] = cv2.LUT(img[..., c], lut)
return outOperations we need
1. Standard normalization (mean + std)
- Per-channel: Compute mean and std per channel, then
out = (x - mean) / std(mean/std shape(C,)). - Global: Single scalar mean and std for all channels:
out = (x - mean) / std.
External values: Caller may pass precomputed mean and std (e.g. ImageNet, YOLO, Inception — see External presets below).
cv2 analogue: None as a single call; we do (x - mean) / std manually. cv2.meanStdDev() can compute per-channel mean/std.
NumPy equivalents:
import numpy as np
# ---- Input: x shape (H, W, C), float32 ----
# Per-channel: compute mean/std from data (over spatial dims), then normalize
def standard_per_channel(x: np.ndarray, axis=(0, 1)) -> np.ndarray:
mean = np.mean(x, axis=axis, keepdims=True) # (1, 1, C)
std = np.std(x, axis=axis, keepdims=True) # (1, 1, C)
return (x - mean) / (std + 1e-7)
# Per-channel: external mean/std (e.g. ImageNet in [0,1]); mean, std shape (C,) or (1,1,C)
def standard_external_per_channel(x: np.ndarray, mean: np.ndarray, std: np.ndarray) -> np.ndarray:
# mean, std e.g. (3,) for RGB; broadcast to (H, W, C)
return (x - mean) / (std + 1e-7)
# Global: single scalar mean and std
def standard_global(x: np.ndarray, mean: float, std: float) -> np.ndarray:
return (x - mean) / (std + 1e-7)
# Example: ImageNet-style (x in [0,255], mean/std in [0,1])
max_val = 255.0
mean_01 = np.array([0.485, 0.456, 0.406], dtype=np.float32)
std_01 = np.array([0.229, 0.224, 0.225], dtype=np.float32)
# out = (x / max_val - mean_01) / std_01For video (N, H, W, C): same ops; per-channel mean/std would use axis=(0, 1, 2) to compute over N, H, W. For volume (D, H, W, C): axis=(0, 1, 2, 3) for per-channel.
2. Min-max normalization
- Per-channel: Compute min and max per channel, then scale to a target range (e.g. [0, 1]).
- Global: Single min/max over all channels; same formula.
External values: Caller may pass precomputed min/max or only the output range [out_min, out_max] (we compute data min/max).
cv2 analogue: cv2.normalize(src, dst, alpha, beta, cv2.NORM_MINMAX, dtype=cv2.CV_32F) — rescales so min→alpha, max→beta.
NumPy equivalents:
import numpy as np
# ---- Input: x shape (H, W, C), float32 ----
# Per-channel: compute min/max from data, scale to [0, 1]
def minmax_per_channel(x: np.ndarray, axis=(0, 1), out_lo=0.0, out_hi=1.0, eps=1e-7) -> np.ndarray:
x_min = np.min(x, axis=axis, keepdims=True) # (1, 1, C)
x_max = np.max(x, axis=axis, keepdims=True) # (1, 1, C)
scale = (x_max - x_min) + eps
normalized = (x - x_min) / scale
return normalized * (out_hi - out_lo) + out_lo
# Global: single min/max over entire array
def minmax_global(x: np.ndarray, out_lo=0.0, out_hi=1.0, eps=1e-7) -> np.ndarray:
x_min, x_max = np.min(x), np.max(x)
scale = (x_max - x_min) + eps
normalized = (x - x_min) / scale
return normalized * (out_hi - out_lo) + out_lo
# External min/max: user provides data range (e.g. 0 and 255 for uint8)
def minmax_external(x: np.ndarray, in_lo: float, in_hi: float, out_lo=0.0, out_hi=1.0) -> np.ndarray:
scale = (in_hi - in_lo) + 1e-7
return (x - in_lo) / scale * (out_hi - out_lo) + out_lo
# e.g. uint8 -> [0,1]: minmax_external(x, 0, 255, 0, 1)For video (N, H, W, C): per-channel min/max over axis=(0, 1, 2). For volume (D, H, W, C): axis=(0, 1, 2, 3).
Input shapes
All operations should accept:
- Image:
(H, W, C) - Video / batch of images:
(N, H, W, C)or(T, H, W, C) - Volume:
(D, H, W, C) - Batch of volumes:
(N, D, H, W, C)
So the “channel” dimension is always the last; spatial and batch/sequence dimensions can vary.
External presets (examples)
We need to support external mean/std or scale factors so users can plug in standard schemes without recomputing:
- YOLO: scale pixel range to [0, 1]: effectively
mean=(0, 0, 0),std=(1, 1, 1)after dividing by 255 (ormean=0,std=255in pixel space then output in [0,1]). - ImageNet (standard):
mean=(0.485, 0.456, 0.406),std=(0.229, 0.224, 0.225)(in [0,1]); in pixel space withmax_pixel_value=255: mean and std are arrays of shape(3,). - Inception: often
mean=(0.5, 0.5, 0.5), scale to [-1, 1] or [0, 1] with fixed scale.
So the API should accept:
- mean: scalar or array of shape
(C,)(or broadcast-compatible). - std: scalar or array of shape
(C,)(or broadcast-compatible). - min / max: optional; for min-max path, scalar or per-channel.
- max_pixel_value: optional (e.g. 255 for uint8); used to convert from [0,255] to [0,1] before applying mean/std in normalized space, or to interpret mean/std given in [0,1] space.
Same idea for divide (and subtract): allow caller to pass the exact values or arrays they want to subtract and divide by (e.g. from a config or from another framework’s normalization constants).
Requested from NumKong
We request these normalize ops from NumKong. AlbumentationsX would call NumKong for normalize (image, video, volume shapes).
Summary table
| Operation | Per-channel | Global | External mean/std | cv2 analogue |
|---|---|---|---|---|
| Standard (mean/std) | Yes | Yes | Yes | No single call |
| Min-max | Yes | Yes | Yes (or from data) | cv2.normalize NORM_MINMAX |
Input shapes: (H,W,C), (N,H,W,C), (D,H,W,C), (N,D,H,W,C). Output: float32. Presets: YOLO, ImageNet, Inception, or user-defined arrays.
Can you contribute to the implementation?
- I can contribute
Is your feature request specific to a certain interface?
It applies to everything
Contact Details
No response
Is there an existing issue for this?
- I have searched the existing issues
Code of Conduct
- I agree to follow this project's Code of Conduct