Skip to content

Feature: Specialized Normalize — image, video, volume #315

@ternaus

Description

@ternaus

Describe what you are looking for

Feature request: Specialized Normalize — image, video, volume

Why this matters

This may look like a very narrow use case, but normalize is the single most widely used transform in all of computer vision augmentations. Almost every training pipeline applies it (ImageNet mean/std, YOLO /255, min-max, etc.). So performance here has outsized impact.


Summary

We request from NumKong a small set of specialized normalize operations. Normalization (subtract mean / divide by std, or min-max scale) is the most common preprocessing step in image and video pipelines.

  • Work for image (H, W, C), video (N, H, W, C) or (T, H, W, C), and volume (D, H, W, C) or (N, D, H, W, C).
  • Support external mean/std or min/max (YOLO, ImageNet, Inception, etc.) or compute them from the data (per-image or per-channel).
  • Performance: We’d love the implementation to be competitive with or faster than (1) direct NumPy, (2) OpenCV’s specialized functions where they exist, and (3) the “build LUT + apply LUT” approach. The section below gives reference code to compare against.

(uint8 → float32 LUT is a separate request — see ashvardanian/StringZilla#302


How we do it now (Albucore)

AlbumentationsX calls Albucore for normalize. Current behavior https://github.com/albumentations-team/albucore/blob/main/albucore/functions.py#L336-L348 :

  • Standard (external mean/std): We pass precomputed mean_np and denominator (i.e. mean and 1/(std * max_pixel_value) in pixel space). Formula: out = (img - mean_np) * denominator → float32. For uint8 input, Albucore builds a float32 LUT of length 256 and applies it (e.g. via cv2.LUT) to get float32 output in one pass instead of converting to float then doing the math.
  • Per-image / min-max: normalize_per_image(img, mode) with modes like "min_max", "image", "image_per_channel", "min_max_per_channel" — compute stats from the image then apply.

So today we have: (a) direct float32 path (numpy-like or fused), (b) uint8 path via LUT. Beating or matching these would make NumKong a natural choice for us.


Reference implementations to compare against

If you’re not deep in CV: the snippets below are reference implementations we use today (NumPy, OpenCV, LUT). Same input (e.g. uint8 image (H, W, 3) or float32), same output (float32). Timing these gives a baseline; we’d adopt NumKong if it’s competitive or faster.

1. Direct NumPy — standard (external mean/std)

import numpy as np

# Input: float32 (H, W, C); mean, std shape (C,) in [0,1]; max_pixel_value = 255
def normalize_numpy_standard(img: np.ndarray, mean: np.ndarray, std: np.ndarray, max_val: float = 255.0) -> np.ndarray:
    x = img.astype(np.float32) / max_val
    return (x - mean) / (std + 1e-7)

2. Direct NumPy — min-max (global, to [0,1])

def normalize_numpy_minmax_global(img: np.ndarray) -> np.ndarray:
    img_f = img.astype(np.float32)
    mn, mx = img_f.min(), img_f.max()
    return (img_f - mn) / ((mx - mn) + 1e-7)

3. OpenCV — min-max (global)

OpenCV has a dedicated function for this; a natural comparison point.

import cv2

def normalize_cv2_minmax(img: np.ndarray) -> np.ndarray:
    # img uint8 or float32; output float32 [0, 1]
    return cv2.normalize(img, None, 0, 1, cv2.NORM_MINMAX, dtype=cv2.CV_32F)

4. OpenCV — per-channel mean/std (compute then apply)

No single cv2 call for “subtract mean, divide by std”; we use cv2.meanStdDev then NumPy.

def normalize_cv2_meanstd_per_channel(img: np.ndarray) -> np.ndarray:
    # img float32 (H, W, C)
    mean, std = cv2.meanStdDev(img)   # (C, 1) each
    mean = mean.flatten()
    std = std.flatten()
    return (img - mean) / (std + 1e-7)

5. LUT path (uint8 → float32) — standard

How Albucore does it for uint8: build a float32 LUT, then one LUT apply per channel (or fused). Another useful baseline.

import cv2

def build_lut_standard(mean: float, inv_std: float) -> np.ndarray:
    # For one channel: out[i] = (i/255 - mean) * inv_std  =>  (i/255 - mean) / std
    lut = np.arange(256, dtype=np.float32) / 255.0
    lut = (lut - mean) * inv_std
    return lut

def normalize_uint8_via_lut(img: np.ndarray, mean: np.ndarray, std: np.ndarray) -> np.ndarray:
    # img uint8 (H, W, C); mean, std (C,)
    out = np.empty((*img.shape[:-1], img.shape[-1]), dtype=np.float32)
    for c in range(img.shape[-1]):
        lut = build_lut_standard(mean[c], 1.0 / (std[c] + 1e-7))
        out[..., c] = cv2.LUT(img[..., c], lut)
    return out

Operations we need

1. Standard normalization (mean + std)

  • Per-channel: Compute mean and std per channel, then out = (x - mean) / std (mean/std shape (C,)).
  • Global: Single scalar mean and std for all channels: out = (x - mean) / std.

External values: Caller may pass precomputed mean and std (e.g. ImageNet, YOLO, Inception — see External presets below).

cv2 analogue: None as a single call; we do (x - mean) / std manually. cv2.meanStdDev() can compute per-channel mean/std.

NumPy equivalents:

import numpy as np

# ---- Input: x shape (H, W, C), float32 ----

# Per-channel: compute mean/std from data (over spatial dims), then normalize
def standard_per_channel(x: np.ndarray, axis=(0, 1)) -> np.ndarray:
    mean = np.mean(x, axis=axis, keepdims=True)   # (1, 1, C)
    std = np.std(x, axis=axis, keepdims=True)     # (1, 1, C)
    return (x - mean) / (std + 1e-7)

# Per-channel: external mean/std (e.g. ImageNet in [0,1]); mean, std shape (C,) or (1,1,C)
def standard_external_per_channel(x: np.ndarray, mean: np.ndarray, std: np.ndarray) -> np.ndarray:
    # mean, std e.g. (3,) for RGB; broadcast to (H, W, C)
    return (x - mean) / (std + 1e-7)

# Global: single scalar mean and std
def standard_global(x: np.ndarray, mean: float, std: float) -> np.ndarray:
    return (x - mean) / (std + 1e-7)

# Example: ImageNet-style (x in [0,255], mean/std in [0,1])
max_val = 255.0
mean_01 = np.array([0.485, 0.456, 0.406], dtype=np.float32)
std_01 = np.array([0.229, 0.224, 0.225], dtype=np.float32)
# out = (x / max_val - mean_01) / std_01

For video (N, H, W, C): same ops; per-channel mean/std would use axis=(0, 1, 2) to compute over N, H, W. For volume (D, H, W, C): axis=(0, 1, 2, 3) for per-channel.

2. Min-max normalization

  • Per-channel: Compute min and max per channel, then scale to a target range (e.g. [0, 1]).
  • Global: Single min/max over all channels; same formula.

External values: Caller may pass precomputed min/max or only the output range [out_min, out_max] (we compute data min/max).

cv2 analogue: cv2.normalize(src, dst, alpha, beta, cv2.NORM_MINMAX, dtype=cv2.CV_32F) — rescales so min→alpha, max→beta.

NumPy equivalents:

import numpy as np

# ---- Input: x shape (H, W, C), float32 ----

# Per-channel: compute min/max from data, scale to [0, 1]
def minmax_per_channel(x: np.ndarray, axis=(0, 1), out_lo=0.0, out_hi=1.0, eps=1e-7) -> np.ndarray:
    x_min = np.min(x, axis=axis, keepdims=True)   # (1, 1, C)
    x_max = np.max(x, axis=axis, keepdims=True)   # (1, 1, C)
    scale = (x_max - x_min) + eps
    normalized = (x - x_min) / scale
    return normalized * (out_hi - out_lo) + out_lo

# Global: single min/max over entire array
def minmax_global(x: np.ndarray, out_lo=0.0, out_hi=1.0, eps=1e-7) -> np.ndarray:
    x_min, x_max = np.min(x), np.max(x)
    scale = (x_max - x_min) + eps
    normalized = (x - x_min) / scale
    return normalized * (out_hi - out_lo) + out_lo

# External min/max: user provides data range (e.g. 0 and 255 for uint8)
def minmax_external(x: np.ndarray, in_lo: float, in_hi: float, out_lo=0.0, out_hi=1.0) -> np.ndarray:
    scale = (in_hi - in_lo) + 1e-7
    return (x - in_lo) / scale * (out_hi - out_lo) + out_lo
# e.g. uint8 -> [0,1]: minmax_external(x, 0, 255, 0, 1)

For video (N, H, W, C): per-channel min/max over axis=(0, 1, 2). For volume (D, H, W, C): axis=(0, 1, 2, 3).


Input shapes

All operations should accept:

  • Image: (H, W, C)
  • Video / batch of images: (N, H, W, C) or (T, H, W, C)
  • Volume: (D, H, W, C)
  • Batch of volumes: (N, D, H, W, C)

So the “channel” dimension is always the last; spatial and batch/sequence dimensions can vary.


External presets (examples)

We need to support external mean/std or scale factors so users can plug in standard schemes without recomputing:

  • YOLO: scale pixel range to [0, 1]: effectively mean=(0, 0, 0), std=(1, 1, 1) after dividing by 255 (or mean=0, std=255 in pixel space then output in [0,1]).
  • ImageNet (standard): mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225) (in [0,1]); in pixel space with max_pixel_value=255: mean and std are arrays of shape (3,).
  • Inception: often mean=(0.5, 0.5, 0.5), scale to [-1, 1] or [0, 1] with fixed scale.

So the API should accept:

  • mean: scalar or array of shape (C,) (or broadcast-compatible).
  • std: scalar or array of shape (C,) (or broadcast-compatible).
  • min / max: optional; for min-max path, scalar or per-channel.
  • max_pixel_value: optional (e.g. 255 for uint8); used to convert from [0,255] to [0,1] before applying mean/std in normalized space, or to interpret mean/std given in [0,1] space.

Same idea for divide (and subtract): allow caller to pass the exact values or arrays they want to subtract and divide by (e.g. from a config or from another framework’s normalization constants).


Requested from NumKong

We request these normalize ops from NumKong. AlbumentationsX would call NumKong for normalize (image, video, volume shapes).


Summary table

Operation Per-channel Global External mean/std cv2 analogue
Standard (mean/std) Yes Yes Yes No single call
Min-max Yes Yes Yes (or from data) cv2.normalize NORM_MINMAX

Input shapes: (H,W,C), (N,H,W,C), (D,H,W,C), (N,D,H,W,C). Output: float32. Presets: YOLO, ImageNet, Inception, or user-defined arrays.

Can you contribute to the implementation?

  • I can contribute

Is your feature request specific to a certain interface?

It applies to everything

Contact Details

No response

Is there an existing issue for this?

  • I have searched the existing issues

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions