@@ -100,4 +100,126 @@ d how much accumulations it allows to do without overflow.
100100
101101Operands Limitations and Shifting Ranges
102102~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
103+ This section describes VPX specific limitations to kernels.
104+ In this section, :math: `n_\text {tensor}` denotes the fractional bits of a tensor
105+ and :math: `s_\text {fx,tensor}` is its scale in case of an asymmetric data type (see :ref: `data_fmts `).
106+
107+ Weighted Kernels
108+ ^^^^^^^^^^^^^^^^
109+ For the following kernels:
110+
111+ * conv2d
112+ * depthwise_conv2d
113+ * transpose_conv2d
114+ * group_conv2d
115+ * fully_connected
116+ * rnn_dense
117+ * gru_cell
118+ * lstm_cell
119+
120+ Firstly, to avoid negative shifts below lower-bound and
121+ to avoid internal large shifts above upper-bound, the the following shift restrictions must be adhered to:
122+
123+ .. math ::
124+ 0 \leq n_\text {in} + n_\text {weight} - n_\text {out} \leq 15 & \quad \text {if FX8 }
125+
126+ 0 \leq n_\text {in} + n_\text {weight} - n_\text {out} \leq 31 & \quad \text {if FX16 and FX16 _FX8 _FX8 }
127+
128+ \text {no limitation} & \quad \text {if SA8 _SA8 _SA32 }
129+ ..
130+
131+ Secondly, the following restrictions relate to shifting left the bias inside an accumulator:
132+
133+ .. math ::
134+ 0 \leq n_\text {in} + n_\text {weight} - n_\text {bias} \leq 8 & \quad \text {if FX8 }
135+
136+ 0 \leq n_\text {in} + n_\text {weight} - n_\text {bias} \leq 16 & \quad \text {if FX16 }
137+
138+ 0 \leq n_\text {in} + n_\text {weight} - n_\text {bias} \leq 24 & \quad \text {if FX16 _FX8 _FX8 }
139+
140+ \text {no limitation} & \quad \text {if SA8 _SA8 _SA32 }
141+ ..
142+
143+
144+ Avepool
145+ ^^^^^^^
146+ **FX16 **
147+
148+ To avoid negative shifts below lower-bound and to avoid internal large shifts
149+ above upper-bound, the in and out fraction bits must be adhered to:
150+
151+ .. math ::
152+ -14 - \text {ceil}(\text {log}_2 (\text {Wk} \cdot \text {Hk})) <
153+ n_\text {in} - n_\text {out}
154+ < 16 - \text {ceil}(\text {log}_2 (\text {Wk} \cdot \text {Hk}))
155+ ..
156+
157+ with :math: `\text {Wk}` and :math: `\text {Hk}` the width and height of the kernel respectively.
158+
159+ **SA8 **
160+
161+ To avoid internal large shifts below lower-bound and to avoid negative shifts
162+ above upper-bound, the in and out scale factors must be adhered to:
163+
164+ .. math :: 127 \cdot 2^{−15} \cdot \text{Wk} \cdot \text{Hk} <
165+ \f rac{s _\t ext{fx,in} \c dot 2^{-n _\t ext{in}}}
166+ {s _\t ext{fx,out} \c dot 2^{-n _\t ext{out}}}
167+ < 64 \c dot \t ext{Wk} \c dot \t ext{Hk}
168+ ..
169+
170+ with :math: `\text {Wk}` and :math: `\text {Hk}` the width and height of the kernel respectively.
171+
172+
173+ RNN Dense
174+ ^^^^^^^^^
175+ **FX16 and FX16_FX8_FX8 **
176+
177+ .. math ::
178+ 0 \leq n_\text {in} + n_\text {weights} - n_\text {out}
179+ ..
180+
181+ **SA8_SA8_SA32 **
182+
183+ .. math ::
184+ \text {acc_scale} = \frac { s_\text {fx,in} \cdot s_\text {fx,weights}}{s_\text {fx,out}} \cdot 2 ^{n_\text {in} + n_\text {weights} − n_\text {out}} \\
185+ 0 < \text {acc_scale} \leq 2 ^{32 - \text {acc_size} - \text {ceil}(\text {log}_2 (\text {input_count}))}
186+ ..
187+
188+ where :math: `\text {acc_size}` is the accumulator size including the guard bits.
189+ Restriction is to avoid saturation between multiple inputs accumulators after
190+ the scale since accumulators are scaled and added in 32 bits vectors.
191+
192+
193+ Leaky and Parametric ReLU
194+ ^^^^^^^^^^^^^^^^^^^^^^^^^
195+ To avoid an extra shift-left instruction in the inner loop,
196+ a negative 'slope_coeff'/'alpha' tensor fractional bits is not permitted:
197+
198+ .. math ::
199+ 0 \leq n_\text {slope_coeff} & \quad \text {if FX16 and FX8 Leaky ReLU}
200+
201+ 0 \leq n_\text {alpha} & \quad \text {if FX16 and FX8 Parametric ReLU}
202+ ..
203+
204+ Element-wise Add and Element-wise Sub
205+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
206+
207+ **FX16 **
208+
209+ Below restriction relates to shifting both inputs such that their fractional bits align.
210+
211+ .. math ::
212+ \text {abs}(n_\text {in1 } - n_\text {in2 }) \leq 15
213+ ..
214+
215+
216+ .. math ::
217+ \text {max}(n_\text {in1 }, n_\text {in2 }) - 31 \leq n_\text {out} \leq \text {max}(n_\text {in1 }, n_\text {in2 }) + 31
218+ ..
219+
220+ **SA8 **
221+
222+ No VPX specific limitations (see :ref: `chap_element_wise ` for general limitations/requirements).
223+
224+
103225
0 commit comments