Feature: Specialized Normalize — image, video, volume

### Describe what you are looking for

# Feature request: Specialized Normalize — image, video, volume

## Why this matters

This may look like a very narrow use case, but **normalize is the single most widely used transform in all of computer vision augmentations**. Almost every training pipeline applies it (ImageNet mean/std, YOLO /255, min-max, etc.). So performance here has outsized impact.

---

## Summary

We request from **NumKong** a small set of **specialized normalize** operations. Normalization (subtract mean / divide by std, or min-max scale) is the **most common** preprocessing step in image and video pipelines.

- Work for **image** `(H, W, C)`, **video** `(N, H, W, C)` or `(T, H, W, C)`, and **volume** `(D, H, W, C)` or `(N, D, H, W, C)`.
- Support **external** mean/std or min/max (YOLO, ImageNet, Inception, etc.) or **compute** them from the data (per-image or per-channel).
- **Performance:** We’d love the implementation to be competitive with or faster than (1) direct NumPy, (2) OpenCV’s specialized functions where they exist, and (3) the “build LUT + apply LUT” approach. The section below gives reference code to compare against.

(uint8 → float32 LUT is a **separate** request — see https://github.com/ashvardanian/StringZilla/issues/302

---

## How we do it now (Albucore)

AlbumentationsX calls **Albucore** for normalize. Current behavior https://github.com/albumentations-team/albucore/blob/main/albucore/functions.py#L336-L348 :

- **Standard (external mean/std):** We pass precomputed `mean_np` and `denominator` (i.e. `mean` and `1/(std * max_pixel_value)` in pixel space). Formula: `out = (img - mean_np) * denominator` → float32. For **uint8** input, Albucore builds a **float32 LUT** of length 256 and applies it (e.g. via `cv2.LUT`) to get float32 output in one pass instead of converting to float then doing the math.
- **Per-image / min-max:** `normalize_per_image(img, mode)` with modes like `"min_max"`, `"image"`, `"image_per_channel"`, `"min_max_per_channel"` — compute stats from the image then apply.

So today we have: (a) direct float32 path (numpy-like or fused), (b) uint8 path via LUT. Beating or matching these would make NumKong a natural choice for us.

---

## Reference implementations to compare against

If you’re not deep in CV: the snippets below are **reference implementations** we use today (NumPy, OpenCV, LUT). Same input (e.g. uint8 image `(H, W, 3)` or float32), same output (float32). Timing these gives a baseline; we’d adopt NumKong if it’s competitive or faster.

### 1. Direct NumPy — standard (external mean/std)

```python
import numpy as np

# Input: float32 (H, W, C); mean, std shape (C,) in [0,1]; max_pixel_value = 255
def normalize_numpy_standard(img: np.ndarray, mean: np.ndarray, std: np.ndarray, max_val: float = 255.0) -> np.ndarray:
    x = img.astype(np.float32) / max_val
    return (x - mean) / (std + 1e-7)
```

### 2. Direct NumPy — min-max (global, to [0,1])

```python
def normalize_numpy_minmax_global(img: np.ndarray) -> np.ndarray:
    img_f = img.astype(np.float32)
    mn, mx = img_f.min(), img_f.max()
    return (img_f - mn) / ((mx - mn) + 1e-7)
```

### 3. OpenCV — min-max (global)

OpenCV has a dedicated function for this; a natural comparison point.

```python
import cv2

def normalize_cv2_minmax(img: np.ndarray) -> np.ndarray:
    # img uint8 or float32; output float32 [0, 1]
    return cv2.normalize(img, None, 0, 1, cv2.NORM_MINMAX, dtype=cv2.CV_32F)
```

### 4. OpenCV — per-channel mean/std (compute then apply)

No single cv2 call for “subtract mean, divide by std”; we use `cv2.meanStdDev` then NumPy.

```python
def normalize_cv2_meanstd_per_channel(img: np.ndarray) -> np.ndarray:
    # img float32 (H, W, C)
    mean, std = cv2.meanStdDev(img)   # (C, 1) each
    mean = mean.flatten()
    std = std.flatten()
    return (img - mean) / (std + 1e-7)
```

### 5. LUT path (uint8 → float32) — standard

How Albucore does it for uint8: build a float32 LUT, then one LUT apply per channel (or fused). Another useful baseline.

```python
import cv2

def build_lut_standard(mean: float, inv_std: float) -> np.ndarray:
    # For one channel: out[i] = (i/255 - mean) * inv_std  =>  (i/255 - mean) / std
    lut = np.arange(256, dtype=np.float32) / 255.0
    lut = (lut - mean) * inv_std
    return lut

def normalize_uint8_via_lut(img: np.ndarray, mean: np.ndarray, std: np.ndarray) -> np.ndarray:
    # img uint8 (H, W, C); mean, std (C,)
    out = np.empty((*img.shape[:-1], img.shape[-1]), dtype=np.float32)
    for c in range(img.shape[-1]):
        lut = build_lut_standard(mean[c], 1.0 / (std[c] + 1e-7))
        out[..., c] = cv2.LUT(img[..., c], lut)
    return out
```
---

## Operations we need

### 1. Standard normalization (mean + std)

- **Per-channel:** Compute mean and std **per channel**, then `out = (x - mean) / std` (mean/std shape `(C,)`).
- **Global:** Single scalar mean and std for all channels: `out = (x - mean) / std`.

**External values:** Caller may pass precomputed mean and std (e.g. ImageNet, YOLO, Inception — see External presets below).

**cv2 analogue:** None as a single call; we do `(x - mean) / std` manually. `cv2.meanStdDev()` can compute per-channel mean/std.

**NumPy equivalents:**

```python
import numpy as np

# ---- Input: x shape (H, W, C), float32 ----

# Per-channel: compute mean/std from data (over spatial dims), then normalize
def standard_per_channel(x: np.ndarray, axis=(0, 1)) -> np.ndarray:
    mean = np.mean(x, axis=axis, keepdims=True)   # (1, 1, C)
    std = np.std(x, axis=axis, keepdims=True)     # (1, 1, C)
    return (x - mean) / (std + 1e-7)

# Per-channel: external mean/std (e.g. ImageNet in [0,1]); mean, std shape (C,) or (1,1,C)
def standard_external_per_channel(x: np.ndarray, mean: np.ndarray, std: np.ndarray) -> np.ndarray:
    # mean, std e.g. (3,) for RGB; broadcast to (H, W, C)
    return (x - mean) / (std + 1e-7)

# Global: single scalar mean and std
def standard_global(x: np.ndarray, mean: float, std: float) -> np.ndarray:
    return (x - mean) / (std + 1e-7)

# Example: ImageNet-style (x in [0,255], mean/std in [0,1])
max_val = 255.0
mean_01 = np.array([0.485, 0.456, 0.406], dtype=np.float32)
std_01 = np.array([0.229, 0.224, 0.225], dtype=np.float32)
# out = (x / max_val - mean_01) / std_01
```

For **video** `(N, H, W, C)`: same ops; per-channel mean/std would use `axis=(0, 1, 2)` to compute over N, H, W. For **volume** `(D, H, W, C)`: `axis=(0, 1, 2, 3)` for per-channel.

### 2. Min-max normalization

- **Per-channel:** Compute min and max **per channel**, then scale to a target range (e.g. [0, 1]).
- **Global:** Single min/max over all channels; same formula.

**External values:** Caller may pass precomputed min/max or only the output range `[out_min, out_max]` (we compute data min/max).

**cv2 analogue:** `cv2.normalize(src, dst, alpha, beta, cv2.NORM_MINMAX, dtype=cv2.CV_32F)` — rescales so min→alpha, max→beta.

**NumPy equivalents:**

```python
import numpy as np

# ---- Input: x shape (H, W, C), float32 ----

# Per-channel: compute min/max from data, scale to [0, 1]
def minmax_per_channel(x: np.ndarray, axis=(0, 1), out_lo=0.0, out_hi=1.0, eps=1e-7) -> np.ndarray:
    x_min = np.min(x, axis=axis, keepdims=True)   # (1, 1, C)
    x_max = np.max(x, axis=axis, keepdims=True)   # (1, 1, C)
    scale = (x_max - x_min) + eps
    normalized = (x - x_min) / scale
    return normalized * (out_hi - out_lo) + out_lo

# Global: single min/max over entire array
def minmax_global(x: np.ndarray, out_lo=0.0, out_hi=1.0, eps=1e-7) -> np.ndarray:
    x_min, x_max = np.min(x), np.max(x)
    scale = (x_max - x_min) + eps
    normalized = (x - x_min) / scale
    return normalized * (out_hi - out_lo) + out_lo

# External min/max: user provides data range (e.g. 0 and 255 for uint8)
def minmax_external(x: np.ndarray, in_lo: float, in_hi: float, out_lo=0.0, out_hi=1.0) -> np.ndarray:
    scale = (in_hi - in_lo) + 1e-7
    return (x - in_lo) / scale * (out_hi - out_lo) + out_lo
# e.g. uint8 -> [0,1]: minmax_external(x, 0, 255, 0, 1)
```

For **video** `(N, H, W, C)`: per-channel min/max over `axis=(0, 1, 2)`. For **volume** `(D, H, W, C)`: `axis=(0, 1, 2, 3)`.

---

## Input shapes

All operations should accept:

- **Image:** `(H, W, C)`
- **Video / batch of images:** `(N, H, W, C)` or `(T, H, W, C)`
- **Volume:** `(D, H, W, C)`
- **Batch of volumes:** `(N, D, H, W, C)`

So the “channel” dimension is always the last; spatial and batch/sequence dimensions can vary.

---

## External presets (examples)

We need to support **external** mean/std or scale factors so users can plug in standard schemes without recomputing:

- **YOLO:** scale pixel range to [0, 1]: effectively `mean=(0, 0, 0)`, `std=(1, 1, 1)` after dividing by 255 (or `mean=0`, `std=255` in pixel space then output in [0,1]).
- **ImageNet (standard):** `mean=(0.485, 0.456, 0.406)`, `std=(0.229, 0.224, 0.225)` (in [0,1]); in pixel space with `max_pixel_value=255`: mean and std are arrays of shape `(3,)`.
- **Inception:** often `mean=(0.5, 0.5, 0.5)`, scale to [-1, 1] or [0, 1] with fixed scale.

So the API should accept:

- **mean:** scalar or array of shape `(C,)` (or broadcast-compatible).
- **std:** scalar or array of shape `(C,)` (or broadcast-compatible).
- **min / max:** optional; for min-max path, scalar or per-channel.
- **max_pixel_value:** optional (e.g. 255 for uint8); used to convert from [0,255] to [0,1] before applying mean/std in normalized space, or to interpret mean/std given in [0,1] space.

Same idea for **divide** (and subtract): allow caller to pass the exact values or arrays they want to subtract and divide by (e.g. from a config or from another framework’s normalization constants).

---

## Requested from NumKong

We request these normalize ops **from NumKong**. AlbumentationsX would call NumKong for normalize (image, video, volume shapes).

---

## Summary table

| Operation        | Per-channel | Global | External mean/std | cv2 analogue        |
|-----------------|-------------|--------|-------------------|---------------------|
| Standard (mean/std) | Yes         | Yes    | Yes               | No single call      |
| Min-max         | Yes         | Yes    | Yes (or from data)| `cv2.normalize` NORM_MINMAX |

Input shapes: `(H,W,C)`, `(N,H,W,C)`, `(D,H,W,C)`, `(N,D,H,W,C)`. Output: float32. Presets: YOLO, ImageNet, Inception, or user-defined arrays.


### Can you contribute to the implementation?

- [ ] I can contribute

### Is your feature request specific to a certain interface?

It applies to everything

### Contact Details

_No response_

### Is there an existing issue for this?

- [x] I have searched the existing issues

### Code of Conduct

- [x] I agree to follow this project's Code of Conduct

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Specialized Normalize — image, video, volume #315

Describe what you are looking for

Feature request: Specialized Normalize — image, video, volume

Why this matters

Summary

How we do it now (Albucore)

Reference implementations to compare against

1. Direct NumPy — standard (external mean/std)

2. Direct NumPy — min-max (global, to [0,1])

3. OpenCV — min-max (global)

4. OpenCV — per-channel mean/std (compute then apply)

5. LUT path (uint8 → float32) — standard

Operations we need

1. Standard normalization (mean + std)

2. Min-max normalization

Input shapes

External presets (examples)

Requested from NumKong

Summary table

Can you contribute to the implementation?

Is your feature request specific to a certain interface?

Contact Details

Is there an existing issue for this?

Code of Conduct

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Operation	Per-channel	Global	External mean/std	cv2 analogue
Standard (mean/std)	Yes	Yes	Yes	No single call
Min-max	Yes	Yes	Yes (or from data)	`cv2.normalize` NORM_MINMAX

Feature: Specialized Normalize — image, video, volume #315

Description

Describe what you are looking for

Feature request: Specialized Normalize — image, video, volume

Why this matters

Summary

How we do it now (Albucore)

Reference implementations to compare against

1. Direct NumPy — standard (external mean/std)

2. Direct NumPy — min-max (global, to [0,1])

3. OpenCV — min-max (global)

4. OpenCV — per-channel mean/std (compute then apply)

5. LUT path (uint8 → float32) — standard

Operations we need

1. Standard normalization (mean + std)

2. Min-max normalization

Input shapes

External presets (examples)

Requested from NumKong

Summary table

Can you contribute to the implementation?

Is your feature request specific to a certain interface?

Contact Details

Is there an existing issue for this?

Code of Conduct

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions