Skip to content

How to perform inference on an hardware with alpha='auto' #134

@lcit

Description

@lcit

I'm trying to implement inference on a hardware using the Xilinx ap_fixed for a model quantized with alpha='auto'. With alpha=1 it is straightforward. The weights (after applying the quantizer) can be exported directly to the hardware. When alpha='auto' is more challenging. I have not found an explanation on how to compute the weights and the scale, so I have analyzed the code.
This is an extract of quantized_bits for alpha='auto':

m = K.pow(2.0, K.cast_to_floatx(unsigned_bits))
m_i = K.pow(2.0, K.cast_to_floatx(self.integer))
x = x / m_i
levels = (2**(self.bits-1)-1) * 2 if self.symmetric else (2**self.bits)-1
scale = (K.max(abs(x), axis=axis, keepdims=True) * 2) / levels
v = tf.floor(tf.abs(x) / scale + 0.5)
mask = v < levels / 2
z = tf.sign(x) * tf.where(mask, v, tf.ones_like(v) * levels / 2)
xq = m_i * z / m
xq2 = scale * xq

My understanding is that z contains the integer representation of the weights that utilize the entire range of the type, that is the scale is optimal. xq are the floating point representation of z. and xq2 the quantized weights in floating point representation that are actually used in the convolution during training. These can exceed the range of the type.

To implement this in the hardware I have to save z as the weights and compute scale which is a constant that have to be applied after the convolution. For alpha='po2' it would be the same but the scale can be applied as a bit shift.

If this is true, it would be nice to have a function that return z and scale as quantized_bits does not.
Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions