Skip to content

Layers : Resize

yaitaissa edited this page Aug 30, 2024 · 3 revisions

Resize layers change the size of the input tensor be performing an interpolation of input's elements to find the output's elements.

How the layer works

Inputs

  • T tensor of size $(N,C_T,H_T,W_T)$
  • List indicating the output size. Either:
    • scales (list[float]): list of real factors by which to multiply T's dimensions
    • sizes (list[int]): list integers representing the output tensor's dimensions
  • (optional) roi (list[float]): 1D tensor given as [start1, ..., startN, end1, ..., endN] (used in one case specified later). Those values are normalized in tensor T's coordinates.

Attributs

  • antialias (int) (default is 0): is set to one, use an antialiasing filter when downsampling with linear or cubic mode (not implemented yet)
  • axes (list[int]): specify a subset of axes on which to apply the layer. By default, all the axes are assumed.
  • coordinate_transformation_mode (string): specify which coordinate transformation function to use (see Coordinate transformation modes section)
  • cubic_coeff_a (float) (default is -0.75): constant used in cubic mode (see Cubic sub-section)
  • exclude_outside (int) (default is 0): is set to 1, the outside of the tensor will have a weight of 0, and the other weight will be normalized to 1 (not implemented yet)
  • extrapolation_value (float) (default is 0.0): if tf_crop_and_resize coordinate transformatino mode is used and $x_{original}$ is not possible (negative value, or higher than the maximal), this constant is used instead.
  • keep_aspect_ratio_policy (string) (default is stretch): Used when the input is sizes, this attribut describe if one wants to keep the ratio of the input tensor (not implemented yet)
  • mode (string) (default is nearest): interpolation mode to use (see Interpolation functions section)
  • nearest_mode (string) (default is round_prefer_floor): rounding mode to use in mode nearest (see Nearest sub-section)

Outputs

  • Y tensor of size $(N,C_Y,H_Y,W_Y)$

If scales is an input, $(N,C_Y,H_Y,W_Y) = (N . scales[0],C_T . scales[1],H_T . scales[2],W_T.scales[3])$

Otherwise, sizes must be in the inputs, then $(N,C_Y,H_Y,W_Y) = (sizes[0],sizes[1],sizes[2],sizes[3])$

Output's values computing

The output is computed element wise:

  • indices $(x_{resized},y_{resized})$ of the output tensor
  • Calculation of coordinates in the input tensor : $(x_{original},y_{original}) = (t(x_{resized}),t(y_{resized}))$
  • Calculates the value using the interpolation function g : $Y(x_{resized},y_{resized}) = {\bf g} (x_{original},y_{original})$

With $t()$ a coordinate transformation function.

The layer focusing on manipulating the last two dimensions of the input tensor, for the rest of the documentation, the first two are going to be ignored, and the 4D tensors will be assimilated to 2D tensors.

Coordinate transformation modes

The $x$ axis (the dimension just after the channels) is used for descriptions. Let be :

  • $x_{resized}$ the coordinate alongside $x$ in the output tensor
  • $x_{original}$ the coordinate alongside $x$ in the input tensor
  • $lengthOriginal$ the size of the input tensor alongside $x$ : $H_T$
  • $lengthResized$ the size of the output tensor alongside $x$ : $H_Y$
  • $scale = lengthResized / lengthOriginal$
  • $outputWidth$ the target size alongside $x$ (can be real if calculated with $scale$): $H_T.scales[2]$
  • $outputWidthInt$ the entire effective size alongside $x$

The coordinate transformation functions are the following one:

  • half_pixel:

$$x_{original} = (x_{resized} + 0.5)/scale - 0.5$$

  • half_pixel_symmetric:

Let be:

$adjustment = outputWidthInt/outputWidth $

$center = inputWidth/2$

$offset = center.(1-adjustment)$

Then:

$$x_{original} = offset + (x_{resized} + 0.5)/scale - 0.5$$

  • pytorch_half_pixel:

$$x_{original} = lengthResized > 1\ ?\ (x_{resized}+0.5)/scale-0.5\ :\ 0$$

  • align_corners:

$$x_{original} = x_{resized}.(lengthOriginal - 1)/(lengthResized - 1)$$

  • asymmetric:

$$x_{original}=x_{resized}/scale$$

  • tf_crop_and_resized (input roi used in this case):

$$x_{original} = lengthResized > 1\ ?\ startX.(lengthOriginal-1) + x_{resized}.(endX-startX).(lengthOriginal-1)/(lengthResized-1)\ :\ 0.5.(startX+endX).(lengthOriginal-1)$$

Example: Let be T of size $(1,1,4,4)$, Y of size $(1,1,12,12)$, with scales $= (1,1,3.1,3.1)$ and roi $=(0,0,0.25,0.25,0,0,0.75,0.75)$ Let's look at the $x$ axis, at coordinate $x_{resized}=6$

  • half_pixel:

$$x_{original} = (6 + 0.5)/3 - 0.5 = 1.67$$

  • half_pixel_symmetric:

$adjustment = 12.4/12 = 1.03 $

$center = 4/2 = 2$

$offset = 2.(1-1.03) = -0.0667$

$$x_{original} = -0.0667 + (6 + 0.5)/3 - 0.5 = 1.60$$

  • pytorch_half_pixel:

$$x_{original} = 12 > 1\ ?\ (6+0.5)/3-0.5\ :\ 0 = 1.67$$

  • align_corners:

$$x_{original} =6.(4 - 1)/(12 - 1) = 1.64$$

  • asymmetric:

$$x_{original}= 6/3 = 2$$

  • tf_crop_and_resized (input roi used in this case):

$$x_{original} = 12 > 1\ ?\ 0.25.(4 - 1) + 6.(0.75 - 0.25).(4 - 1)/(12 - 1)\ :\ 0.5.(0.25 + 0.75).(4 - 1) = 1.57$$

Interpolation functions

Nearest

This interpolation mode simply round $x_{original}$ and $y_{original}$ to an integer when calculated, then evaluate the tensor T using those rounded coordinates.

Function g:

Let be $n()$ a rounding function.

$$ {\bf g} (x_{original},y_{original}) = T(n(x_{original}),n(y_{original}))$$

Rouding functions:

  • floor:

$$n(x_{original}) = floor(x_{original})$$

  • ceil:

$$n(x_{original}) = ceil(x_{original})$$

  • round_prefer_floor:

$$n(x_{original}) = floor(ceil(2.x_{original})/2)$$

  • round_prefer_ceil:

$$n(x_{original}) = ceil(floor(2.x_{original})/2)$$

Example: Let be $x_{original}=3.50$.

  • floor:

$$n(x_{original}) = floor(3.50) = 3$$

  • ceil:

$$n(x_{original}) = ceil(3.50) = 4$$

  • round_prefer_floor:

$$n(x_{original}) = floor(ceil(2*3.50)/2) = floor(7/2) = 3$$

  • round_prefer_ceil:

$$n(x_{original}) = ceil(floor(2*3.50)/2) = ceil(7/2) = 4$$

Linear

This interpolation mode do the linear interpolation of the 2 (1D) or 4 (2D) closest points of $(x_{original},y_{original})$ to find the output value.

Function g:

The function described here is the bi-linear interpolation function. Is the input is 1D, the linear interpolation function used is similar to one the presented below, with $f_{10} = f_{11} = 0$

To simplify the writing, we note in this section $(x,y)=(x_{original},y_{original})$

Let be:

$(x0,x1,y0,y1) = (floor(x),floor(x)+1,floor(y),floor(y)+1)$

$f_{00} = T(x0,y0)$

$f_{10} = T(x0,y1)$

$f_{11} = T(x1,y1)$

$f_{01} = T(x1,y0)$

Then:

$${\bf g} (x,y) = \frac{f_{00}.(x1 - x).(y1 - y) + f_{01}.(x1 - x).(y - y0) + f_{11}.(x - x0).(y - y0) + f_{10}.(x - x0).(y1 - y)}{(x1 - x0).(y1 - y0)}$$

Cubic

This interpolation mode do the cubic interpolation of the 4 (1D) or 16 (2D) closest points of $(x_{original},y_{original})$ to find the output value.

Function in 1D

In this section, the input tensor T is supposed to be 1D

To simplify the writing, we note in this section $x0=floor(x_{original})$ and a=cubic_coeff_a

Let be:

$f_{-1} = T(x0-1)$

$f_{0} = T(x0)$

$f_{1} = T(x0+1)$

$f_{2} = T(x0+2)$

$s = x_{original} - x0$

$u_{1} = a.s^3 - 2a.s^2 + a.s$

$u_0 = (a + 2).s^3 - (a + 3).s^2 + 1$

$u_{-1} = -(a + 2).s^3 + (2a + 3).s^2 - s.s$

$u_{-2} = -a.s^3 + a.s^2$

Then:

$$ {\bf g} (x_{original}) = f_{-1}.u_1 + f_0.u_0 + f_1.u_{-1} + f_2.u_{-2}$$

Function in 2D

In this section, the input tensor T is supposed to be 2D

To simplify the writing, we note in this section $(x0,y0)=(floor(x_{original}),floor(y_{original}))$ and p the 1D cubic interpolation function that takes as input the point at which to perform the interpolation, along with the four points f_{−1}​, f_0​, f_1​, and f_2​.

Let be:

$\forall (i,j) \in (-1,0,1,2),\ f_{i,j} = T(x0+i,y0+j)$

$\forall j \in (-1,0,1,2),\ b_j = p(x_{original},f_{-1,j},f_{0,j},f_{1,j},f_{2,j})$

Then:

$ {\bf g} (x_{original}, y_{original}) = p(y_{original}, b_{-1}, b_0, b_1, b_2)$

Boundary cases

For some values of x_{original} (resp. y_{original}), $floor(x_{original}) - 1$, $floor(x_{original}) + 1$ or $floor(x_{original}) + 2$ (resp. $floor(y_{original}) - 1$, $floor(y_{original}) + 1$ or $floor(y_{original}) + 2$) can be outside of the accepted range of $[0, H_T]$ (resp. $[0, W_T]$). When this case append, the value is clipped to fit in the range.

Example:

Let be $x_{original}=0$

Then $floor(x_{original})-1=-1$, which is outside the scope

The value is thus clipped befor usage: $f_{-1} = T(0)$

Clone this wiki locally