Skip to content

Commit 5b65732

Browse files
rhilkensdzakhar
authored andcommitted
[DOCS] shift_restrictions
1 parent 850e17f commit 5b65732

File tree

1 file changed

+122
-0
lines changed
  • doc/documents/platform_specific

1 file changed

+122
-0
lines changed

doc/documents/platform_specific/vpx.rst

Lines changed: 122 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -100,4 +100,126 @@ d how much accumulations it allows to do without overflow.
100100

101101
Operands Limitations and Shifting Ranges
102102
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
103+
This section describes VPX specific limitations to kernels.
104+
In this section, :math:`n_\text{tensor}` denotes the fractional bits of a tensor
105+
and :math:`s_\text{fx,tensor}` is its scale in case of an asymmetric data type (see :ref:`data_fmts`).
106+
107+
Weighted Kernels
108+
^^^^^^^^^^^^^^^^
109+
For the following kernels:
110+
111+
* conv2d
112+
* depthwise_conv2d
113+
* transpose_conv2d
114+
* group_conv2d
115+
* fully_connected
116+
* rnn_dense
117+
* gru_cell
118+
* lstm_cell
119+
120+
Firstly, to avoid negative shifts below lower-bound and
121+
to avoid internal large shifts above upper-bound, the the following shift restrictions must be adhered to:
122+
123+
.. math::
124+
0 \leq n_\text{in} + n_\text{weight} - n_\text{out} \leq 15 & \quad \text{if FX8}
125+
126+
0 \leq n_\text{in} + n_\text{weight} - n_\text{out} \leq 31 & \quad \text{if FX16 and FX16_FX8_FX8}
127+
128+
\text{no limitation} & \quad \text{if SA8_SA8_SA32}
129+
..
130+
131+
Secondly, the following restrictions relate to shifting left the bias inside an accumulator:
132+
133+
.. math::
134+
0 \leq n_\text{in} + n_\text{weight} - n_\text{bias} \leq 8 & \quad \text{if FX8}
135+
136+
0 \leq n_\text{in} + n_\text{weight} - n_\text{bias} \leq 16 & \quad \text{if FX16}
137+
138+
0 \leq n_\text{in} + n_\text{weight} - n_\text{bias} \leq 24 & \quad \text{if FX16_FX8_FX8}
139+
140+
\text{no limitation} & \quad \text{if SA8_SA8_SA32}
141+
..
142+
143+
144+
Avepool
145+
^^^^^^^
146+
**FX16**
147+
148+
To avoid negative shifts below lower-bound and to avoid internal large shifts
149+
above upper-bound, the in and out fraction bits must be adhered to:
150+
151+
.. math::
152+
-14 - \text{ceil}(\text{log}_2 (\text{Wk} \cdot \text{Hk})) <
153+
n_\text{in} - n_\text{out}
154+
< 16 - \text{ceil}(\text{log}_2 (\text{Wk} \cdot \text{Hk}))
155+
..
156+
157+
with :math:`\text{Wk}` and :math:`\text{Hk}` the width and height of the kernel respectively.
158+
159+
**SA8**
160+
161+
To avoid internal large shifts below lower-bound and to avoid negative shifts
162+
above upper-bound, the in and out scale factors must be adhered to:
163+
164+
.. math:: 127 \cdot 2^{−15} \cdot \text{Wk} \cdot \text{Hk} <
165+
\frac{s_\text{fx,in} \cdot 2^{-n_\text{in}}}
166+
{s_\text{fx,out} \cdot 2^{-n_\text{out}}}
167+
< 64 \cdot \text{Wk} \cdot \text{Hk}
168+
..
169+
170+
with :math:`\text{Wk}` and :math:`\text{Hk}` the width and height of the kernel respectively.
171+
172+
173+
RNN Dense
174+
^^^^^^^^^
175+
**FX16 and FX16_FX8_FX8**
176+
177+
.. math::
178+
0 \leq n_\text{in} + n_\text{weights} - n_\text{out}
179+
..
180+
181+
**SA8_SA8_SA32**
182+
183+
.. math::
184+
\text{acc_scale} = \frac{ s_\text{fx,in} \cdot s_\text{fx,weights}}{s_\text{fx,out}} \cdot 2^{n_\text{in} + n_\text{weights} − n_\text{out}} \\
185+
0 < \text{acc_scale} \leq 2^{32 - \text{acc_size} - \text{ceil}(\text{log}_2(\text{input_count}))}
186+
..
187+
188+
where :math:`\text{acc_size}` is the accumulator size including the guard bits.
189+
Restriction is to avoid saturation between multiple inputs accumulators after
190+
the scale since accumulators are scaled and added in 32 bits vectors.
191+
192+
193+
Leaky and Parametric ReLU
194+
^^^^^^^^^^^^^^^^^^^^^^^^^
195+
To avoid an extra shift-left instruction in the inner loop,
196+
a negative 'slope_coeff'/'alpha' tensor fractional bits is not permitted:
197+
198+
.. math::
199+
0 \leq n_\text{slope_coeff} & \quad \text{if FX16 and FX8 Leaky ReLU}
200+
201+
0 \leq n_\text{alpha} & \quad \text{if FX16 and FX8 Parametric ReLU}
202+
..
203+
204+
Element-wise Add and Element-wise Sub
205+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
206+
207+
**FX16**
208+
209+
Below restriction relates to shifting both inputs such that their fractional bits align.
210+
211+
.. math::
212+
\text{abs}(n_\text{in1} - n_\text{in2}) \leq 15
213+
..
214+
215+
216+
.. math::
217+
\text{max}(n_\text{in1}, n_\text{in2}) - 31 \leq n_\text{out} \leq \text{max}(n_\text{in1}, n_\text{in2}) + 31
218+
..
219+
220+
**SA8**
221+
222+
No VPX specific limitations (see :ref:`chap_element_wise` for general limitations/requirements).
223+
224+
103225

0 commit comments

Comments
 (0)