Not improvement in CPU inference speed of static quantized model #17814

albertofernandezvillan · 2023-10-06T07:28:49Z

albertofernandezvillan
Oct 6, 2023

Tested static quantization as my model is a CNN model. I tried with different parameters for configuring the static quantization, but none of them seem to improve speed for CPU inference. Exploring the quantized model seems OK. Just to note, for inference, I am not using any Execution providers, just using OpenVino for inference, because in production machines we have configured OpenVino for inference other models.

Code is more or less as follows:

def preprocess_image(image_path, val_transforms):
    img = Image.open(image_path).convert('L')
    input_data = val_transforms(img)
    return np.expand_dims(input_data.numpy(), axis=0)

class DataReader(CalibrationDataReader):
    def __init__(self, calibration_image_folder, model_path, n_images):
        self.enum_data = None
		
        session = onnxruntime.InferenceSession(model_path, None)
        (_, _, height, width) = session.get_inputs()[0].shape

        val_transforms = Compose([ToTensor(), CenterCrop((height, width))])

        image_names = os.listdir(calibration_image_folder)
        batch_filenames = image_names[:n_images] if n_images > 0 else image_names
        self.preprocessed_data_list = [preprocess_image(os.path.join(calibration_image_folder, name), val_transforms)
                                       for name in batch_filenames]
        
        self.input_name = session.get_inputs()[0].name
        self.datasize = len(self.preprocessed_data_list)

    def get_next(self):
        if self.enum_data is None:
            self.enum_data = iter(
                [{self.input_name: data} for data in self.preprocessed_data_list]
            )
        return next(self.enum_data, None)

    def rewind(self):
        self.enum_data = None

model_path = "path/model/onnx"
model_pre_path = "path/model/pre_onnx"
model_qs_path = "path/model/quantized_onnx"
img_dir = "path/calibration/images"
img_calib_size = 2000

data_reader = DataReader(calibration_image_folder=img_dir, model_path=model_path, n_images=img_calib_size)


# Pre-processing
quant_pre_process(model_path, model_pre_path)

# If model is targeted to CPU, asymmetric activation and symmetric weight 
# are recommended for balance of performance and accuracy
# Test with default values
q_static_opts = {"ActivationSymmetric":False, # Symmetrize calibration data for activations (default is False)
                 "WeightSymmetric":True} # Symmetrize calibration data for weights (default is True)
calibrate_method=CalibrationMethod.MinMax
quant_format=QuantFormat.QDQ
activation_type=QuantType.QInt8
weight_type=QuantType.QInt8

# Static quantization
quantize_static(
    model_input = model_pre_path, 
    model_output = model_qs_path, 
    calibration_data_reader = data_reader, 
    quant_format = quant_format, 
    activation_type = activation_type, 
    weight_type = weight_type, 
    calibrate_method = calibrate_method)

My questions are:

Is more or less the code for static quantization correct?
Is it OK to not have improvement in inference speed?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Not improvement in CPU inference speed of static quantized model #17814

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Not improvement in CPU inference speed of static quantized model #17814

Uh oh!

Uh oh!

albertofernandezvillan Oct 6, 2023

Replies: 0 comments

albertofernandezvillan
Oct 6, 2023