Skip to content

PTQ calibration shows bad results. #375

@taestaes

Description

@taestaes

I followed the https://github.com/alibaba/TinyNeuralNetwork/blob/main/examples/quantization/post_error_anaylsis.py

calibrated with 10 iterations on test dataloader.

But my network PTQ outputs really bad results as below.

why the following layers have really low cosine similarity?

float_functional_simple_8                          cosine: 0.4407, scale: 0.0016, zero_point: 133
output                                             cosine: 0.6237, scale: 0.0004, zero_point: 154

and found the result image have values of multiple of 0.004 (maybe this is because 8 bit?)

WARNING (tinynn.util.quantization_analysis_util) Quantization error report:

Activations (cosine sorted ):
fake_quant_0                                       cosine: 1.0000, scale: 0.0043, zero_point: 5
float_functional_simple_0                          cosine: 1.0000, scale: 0.0043, zero_point: 5
patch_embed_proj                                   cosine: 0.9980, scale: 0.0133, zero_point: 125
body_0_ffn_project_in                              cosine: 0.9858, scale: 0.0177, zero_point: 126
body_0_ffn_dwconv                                  cosine: 0.9823, scale: 0.0127, zero_point: 124
body_0_ffn_act.quant                               cosine: 1.0000, scale: 0.0005, zero_point: 255
body_0_ffn_act.f_mul_alpha                         cosine: 0.9720, scale: 0.0008, zero_point: 0
body_0_ffn_act.f_add                               cosine: 0.9801, scale: 0.0051, zero_point: 0
body_0_ffn_project_out                             cosine: 0.9130, scale: 0.0134, zero_point: 128
float_functional_simple_2                          cosine: 0.9400, scale: 0.0142, zero_point: 139
body_1_ffn_project_in                              cosine: 0.9636, scale: 0.0285, zero_point: 123
body_1_ffn_dwconv                                  cosine: 0.9774, scale: 0.0499, zero_point: 188
body_1_ffn_act.quant                               cosine: 1.0000, scale: 0.0000, zero_point: 255
body_1_ffn_act.f_mul_alpha                         cosine: 0.9806, scale: 0.0001, zero_point: 0
body_1_ffn_act.f_add                               cosine: 0.8992, scale: 0.0133, zero_point: 0
body_1_ffn_project_out                             cosine: 0.9002, scale: 0.0131, zero_point: 127
float_functional_simple_4                          cosine: 0.8736, scale: 0.0064, zero_point: 135
body_2_ffn_project_in                              cosine: 0.9458, scale: 0.0125, zero_point: 130
body_2_ffn_dwconv                                  cosine: 0.9597, scale: 0.0170, zero_point: 177
body_2_ffn_act.quant                               cosine: 1.0000, scale: 0.0001, zero_point: 0
body_2_ffn_act.f_mul_alpha                         cosine: 0.9837, scale: 0.0003, zero_point: 238
body_2_ffn_act.f_add                               cosine: 0.9069, scale: 0.0067, zero_point: 49
body_2_ffn_project_out                             cosine: 0.9250, scale: 0.0070, zero_point: 122
float_functional_simple_6                          cosine: 0.9139, scale: 0.0061, zero_point: 122
body_3_ffn_project_in                              cosine: 0.9626, scale: 0.0085, zero_point: 126
body_3_ffn_dwconv                                  cosine: 0.9880, scale: 0.0102, zero_point: 147
body_3_ffn_act.quant                               cosine: 1.0000, scale: 0.0001, zero_point: 255
body_3_ffn_act.f_mul_alpha                         cosine: 0.9933, scale: 0.0001, zero_point: 0
body_3_ffn_act.f_add                               cosine: 0.9638, scale: 0.0048, zero_point: 0
body_3_ffn_project_out                             cosine: 0.9703, scale: 0.0051, zero_point: 148
float_functional_simple_8                          cosine: 0.4407, scale: 0.0016, zero_point: 133
output                                             cosine: 0.6237, scale: 0.0004, zero_point: 154
float_functional_simple_9                          cosine: 0.9894, scale: 0.0040, zero_point: 2

WARNING (tinynn.util.quantization_analysis_util) Quantization error report:

Weights (cosine sorted 20):
output                                   cosine: 0.9996, scale: 0.0029, zero_point: 0
patch_embed_proj                         cosine: 0.9998, scale: 0.0056, zero_point: 0
body_3_ffn_dwconv                        cosine: 0.9999, scale: 0.0113, zero_point: 0
body_1_ffn_dwconv                        cosine: 0.9999, scale: 0.0114, zero_point: 0
body_0_ffn_dwconv                        cosine: 0.9999, scale: 0.0132, zero_point: 0
body_2_ffn_dwconv                        cosine: 0.9999, scale: 0.0110, zero_point: 0
body_0_ffn_project_in                    cosine: 0.9999, scale: 0.0118, zero_point: 0
body_1_ffn_project_in                    cosine: 0.9999, scale: 0.0104, zero_point: 0
body_2_ffn_project_in                    cosine: 0.9999, scale: 0.0094, zero_point: 0
body_3_ffn_project_out                   cosine: 0.9999, scale: 0.0052, zero_point: 0
body_0_ffn_project_out                   cosine: 0.9999, scale: 0.0051, zero_point: 0
body_1_ffn_project_out                   cosine: 0.9999, scale: 0.0058, zero_point: 0
body_3_ffn_project_in                    cosine: 0.9999, scale: 0.0085, zero_point: 0
body_2_ffn_project_out                   cosine: 0.9999, scale: 0.0048, zero_point: 0

Activations (cosine sorted 20):
body_1_ffn_act.f_add                               cosine: 0.9432, scale: 0.0133, zero_point: 0
body_0_ffn_project_in                              cosine: 0.9546, scale: 0.0177, zero_point: 126
body_0_ffn_act.f_mul_alpha                         cosine: 0.9757, scale: 0.0008, zero_point: 0
body_0_ffn_dwconv                                  cosine: 0.9781, scale: 0.0127, zero_point: 124
body_1_ffn_project_in                              cosine: 0.9788, scale: 0.0285, zero_point: 123
body_0_ffn_act.f_add                               cosine: 0.9803, scale: 0.0051, zero_point: 0
body_1_ffn_dwconv                                  cosine: 0.9900, scale: 0.0499, zero_point: 188
patch_embed_proj                                   cosine: 0.9907, scale: 0.0133, zero_point: 125
body_1_ffn_project_out                             cosine: 0.9928, scale: 0.0131, zero_point: 127
float_functional_simple_2                          cosine: 0.9939, scale: 0.0142, zero_point: 139
body_0_ffn_project_out                             cosine: 0.9941, scale: 0.0134, zero_point: 128
body_1_ffn_act.f_mul_alpha                         cosine: 0.9949, scale: 0.0001, zero_point: 0
body_2_ffn_project_out                             cosine: 0.9991, scale: 0.0070, zero_point: 122
float_functional_simple_8                          cosine: 0.9993, scale: 0.0016, zero_point: 133
float_functional_simple_4                          cosine: 0.9993, scale: 0.0064, zero_point: 135
body_2_ffn_act.f_add                               cosine: 0.9994, scale: 0.0067, zero_point: 49
body_3_ffn_project_out                             cosine: 0.9995, scale: 0.0051, zero_point: 148
body_3_ffn_act.f_add                               cosine: 0.9995, scale: 0.0048, zero_point: 0
body_2_ffn_project_in                              cosine: 0.9995, scale: 0.0125, zero_point: 130
float_functional_simple_6                          cosine: 0.9996, scale: 0.0061, zero_point: 122

QMLBNR_Video_Images_CleanedImgOut(
  (fake_quant_0): QuantStub(
    (activation_post_process): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0043], device='cuda:0'), zero_point=tensor([5], device='cuda:0', dtype=torch.int32)
      (activation_post_process): HistogramObserver(min_val=-0.060858067125082016, max_val=1.065263271331787)
    )
  )
  (patch_embed_proj): Conv2d(
    34, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
    (weight_fake_quant): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=-128, quant_max=127, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric, ch_axis=-1, scale=tensor([0.0056], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
      (activation_post_process): MinMaxObserver(min_val=-0.7156392335891724, max_val=0.6727595925331116)
    )
    (activation_post_process): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0133], device='cuda:0'), zero_point=tensor([125], device='cuda:0', dtype=torch.int32)
      (activation_post_process): HistogramObserver(min_val=-2.7740585803985596, max_val=2.808042287826538)
    )
  )
  (body_0_ffn_project_in): Conv2d(
    32, 64, kernel_size=(1, 1), stride=(1, 1)
    (weight_fake_quant): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=-128, quant_max=127, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric, ch_axis=-1, scale=tensor([0.0118], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
      (activation_post_process): MinMaxObserver(min_val=-1.50102961063385, max_val=1.4929399490356445)
    )
    (activation_post_process): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0177], device='cuda:0'), zero_point=tensor([126], device='cuda:0', dtype=torch.int32)
      (activation_post_process): HistogramObserver(min_val=-8.30683708190918, max_val=9.019135475158691)
    )
  )
  (body_0_ffn_dwconv): Conv2d(
    64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=64
    (weight_fake_quant): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=-128, quant_max=127, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric, ch_axis=-1, scale=tensor([0.0132], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
      (activation_post_process): MinMaxObserver(min_val=-1.6797765493392944, max_val=1.2519268989562988)
    )
    (activation_post_process): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0127], device='cuda:0'), zero_point=tensor([124], device='cuda:0', dtype=torch.int32)
      (activation_post_process): HistogramObserver(min_val=-8.81155776977539, max_val=8.465405464172363)
    )
  )
  (body_0_ffn_act): QPReLU(
    (relu1): ReLU()
    (relu2): ReLU()
    (f_mul_neg_one1): FloatFunctional(
      (activation_post_process): PTQFakeQuantize(
        fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([1.], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
        (activation_post_process): HistogramObserver(min_val=inf, max_val=-inf)
      )
    )
    (f_mul_neg_one2): FloatFunctional(
      (activation_post_process): PTQFakeQuantize(
        fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([1.], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
        (activation_post_process): HistogramObserver(min_val=inf, max_val=-inf)
      )
    )
    (f_mul_alpha): FloatFunctional(
      (activation_post_process): PTQFakeQuantize(
        fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0008], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
        (activation_post_process): HistogramObserver(min_val=0.0, max_val=1.1990134716033936)
      )
    )
    (f_add): FloatFunctional(
      (activation_post_process): PTQFakeQuantize(
        fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0051], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
        (activation_post_process): HistogramObserver(min_val=1.9476141185914564e-11, max_val=8.480807304382324)
      )
    )
    (quant): QuantStub(
      (activation_post_process): PTQFakeQuantize(
        fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0005], device='cuda:0'), zero_point=tensor([255], device='cuda:0', dtype=torch.int32)
        (activation_post_process): HistogramObserver(min_val=-0.1360728144645691, max_val=-0.1360728144645691)
      )
    )
  )
  (body_0_ffn_project_out): Conv2d(
    64, 32, kernel_size=(1, 1), stride=(1, 1)
    (weight_fake_quant): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=-128, quant_max=127, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric, ch_axis=-1, scale=tensor([0.0051], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
      (activation_post_process): MinMaxObserver(min_val=-0.656369149684906, max_val=0.5680031180381775)
    )
    (activation_post_process): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0134], device='cuda:0'), zero_point=tensor([128], device='cuda:0', dtype=torch.int32)
      (activation_post_process): HistogramObserver(min_val=-4.129014492034912, max_val=2.9563019275665283)
    )
  )
  (body_1_ffn_project_in): Conv2d(
    32, 64, kernel_size=(1, 1), stride=(1, 1)
    (weight_fake_quant): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=-128, quant_max=127, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric, ch_axis=-1, scale=tensor([0.0104], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
      (activation_post_process): MinMaxObserver(min_val=-1.1867985725402832, max_val=1.3303239345550537)
    )
    (activation_post_process): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0285], device='cuda:0'), zero_point=tensor([123], device='cuda:0', dtype=torch.int32)
      (activation_post_process): HistogramObserver(min_val=-10.604576110839844, max_val=11.774846076965332)
    )
  )
  (body_1_ffn_dwconv): Conv2d(
    64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=64
    (weight_fake_quant): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=-128, quant_max=127, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric, ch_axis=-1, scale=tensor([0.0114], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
      (activation_post_process): MinMaxObserver(min_val=-1.218837857246399, max_val=1.4485722780227661)
    )
    (activation_post_process): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0499], device='cuda:0'), zero_point=tensor([188], device='cuda:0', dtype=torch.int32)
      (activation_post_process): HistogramObserver(min_val=-26.407068252563477, max_val=9.284073829650879)
    )
  )
  (body_1_ffn_act): QPReLU(
    (relu1): ReLU()
    (relu2): ReLU()
    (f_mul_neg_one1): FloatFunctional(
      (activation_post_process): PTQFakeQuantize(
        fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([1.], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
        (activation_post_process): HistogramObserver(min_val=inf, max_val=-inf)
      )
    )
    (f_mul_neg_one2): FloatFunctional(
      (activation_post_process): PTQFakeQuantize(
        fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([1.], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
        (activation_post_process): HistogramObserver(min_val=inf, max_val=-inf)
      )
    )
    (f_mul_alpha): FloatFunctional(
      (activation_post_process): PTQFakeQuantize(
        fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([6.1041e-05], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
        (activation_post_process): HistogramObserver(min_val=0.0, max_val=0.04620003327727318)
      )
    )
    (f_add): FloatFunctional(
      (activation_post_process): PTQFakeQuantize(
        fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0133], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
        (activation_post_process): HistogramObserver(min_val=8.125244184073455e-13, max_val=9.228955268859863)
      )
    )
    (quant): QuantStub(
      (activation_post_process): PTQFakeQuantize(
        fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([6.8477e-06], device='cuda:0'), zero_point=tensor([255], device='cuda:0', dtype=torch.int32)
        (activation_post_process): HistogramObserver(min_val=-0.0017461760435253382, max_val=-0.0017461760435253382)
      )
    )
  )
  (body_1_ffn_project_out): Conv2d(
    64, 32, kernel_size=(1, 1), stride=(1, 1)
    (weight_fake_quant): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=-128, quant_max=127, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric, ch_axis=-1, scale=tensor([0.0058], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
      (activation_post_process): MinMaxObserver(min_val=-0.6410632133483887, max_val=0.7345181703567505)
    )
    (activation_post_process): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0131], device='cuda:0'), zero_point=tensor([127], device='cuda:0', dtype=torch.int32)
      (activation_post_process): HistogramObserver(min_val=-2.9626619815826416, max_val=3.1261534690856934)
    )
  )
  (body_2_ffn_project_in): Conv2d(
    32, 64, kernel_size=(1, 1), stride=(1, 1)
    (weight_fake_quant): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=-128, quant_max=127, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric, ch_axis=-1, scale=tensor([0.0094], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
      (activation_post_process): MinMaxObserver(min_val=-0.9506298303604126, max_val=1.1930553913116455)
    )
    (activation_post_process): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0125], device='cuda:0'), zero_point=tensor([130], device='cuda:0', dtype=torch.int32)
      (activation_post_process): HistogramObserver(min_val=-4.432260036468506, max_val=4.595865249633789)
    )
  )
  (body_2_ffn_dwconv): Conv2d(
    64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=64
    (weight_fake_quant): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=-128, quant_max=127, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric, ch_axis=-1, scale=tensor([0.0110], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
      (activation_post_process): MinMaxObserver(min_val=-1.1248000860214233, max_val=1.408692479133606)
    )
    (activation_post_process): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0170], device='cuda:0'), zero_point=tensor([177], device='cuda:0', dtype=torch.int32)
      (activation_post_process): HistogramObserver(min_val=-12.326212882995605, max_val=4.741044044494629)
    )
  )
  (body_2_ffn_act): QPReLU(
    (relu1): ReLU()
    (relu2): ReLU()
    (f_mul_neg_one1): FloatFunctional(
      (activation_post_process): PTQFakeQuantize(
        fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([1.], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
        (activation_post_process): HistogramObserver(min_val=inf, max_val=-inf)
      )
    )
    (f_mul_neg_one2): FloatFunctional(
      (activation_post_process): PTQFakeQuantize(
        fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([1.], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
        (activation_post_process): HistogramObserver(min_val=inf, max_val=-inf)
      )
    )
    (f_mul_alpha): FloatFunctional(
      (activation_post_process): PTQFakeQuantize(
        fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0003], device='cuda:0'), zero_point=tensor([238], device='cuda:0', dtype=torch.int32)
        (activation_post_process): HistogramObserver(min_val=-0.32712748646736145, max_val=0.005777135491371155)
      )
    )
    (f_add): FloatFunctional(
      (activation_post_process): PTQFakeQuantize(
        fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0067], device='cuda:0'), zero_point=tensor([49], device='cuda:0', dtype=torch.int32)
        (activation_post_process): HistogramObserver(min_val=-0.32712748646736145, max_val=4.730205535888672)
      )
    )
    (quant): QuantStub(
      (activation_post_process): PTQFakeQuantize(
        fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0001], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
        (activation_post_process): HistogramObserver(min_val=0.02653917297720909, max_val=0.02653917297720909)
      )
    )
  )
  (body_2_ffn_project_out): Conv2d(
    64, 32, kernel_size=(1, 1), stride=(1, 1)
    (weight_fake_quant): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=-128, quant_max=127, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric, ch_axis=-1, scale=tensor([0.0048], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
      (activation_post_process): MinMaxObserver(min_val=-0.6111413240432739, max_val=0.6124895811080933)
    )
    (activation_post_process): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0070], device='cuda:0'), zero_point=tensor([122], device='cuda:0', dtype=torch.int32)
      (activation_post_process): HistogramObserver(min_val=-2.243239164352417, max_val=2.098046064376831)
    )
  )
  (body_3_ffn_project_in): Conv2d(
    32, 64, kernel_size=(1, 1), stride=(1, 1)
    (weight_fake_quant): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=-128, quant_max=127, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric, ch_axis=-1, scale=tensor([0.0085], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
      (activation_post_process): MinMaxObserver(min_val=-0.9145596027374268, max_val=1.082226276397705)
    )
    (activation_post_process): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0085], device='cuda:0'), zero_point=tensor([126], device='cuda:0', dtype=torch.int32)
      (activation_post_process): HistogramObserver(min_val=-4.47300910949707, max_val=3.173271656036377)
    )
  )
  (body_3_ffn_dwconv): Conv2d(
    64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=64
    (weight_fake_quant): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=-128, quant_max=127, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric, ch_axis=-1, scale=tensor([0.0113], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
      (activation_post_process): MinMaxObserver(min_val=-1.4376935958862305, max_val=1.413638710975647)
    )
    (activation_post_process): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0102], device='cuda:0'), zero_point=tensor([147], device='cuda:0', dtype=torch.int32)
      (activation_post_process): HistogramObserver(min_val=-4.438266754150391, max_val=4.107792377471924)
    )
  )
  (body_3_ffn_act): QPReLU(
    (relu1): ReLU()
    (relu2): ReLU()
    (f_mul_neg_one1): FloatFunctional(
      (activation_post_process): PTQFakeQuantize(
        fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([1.], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
        (activation_post_process): HistogramObserver(min_val=inf, max_val=-inf)
      )
    )
    (f_mul_neg_one2): FloatFunctional(
      (activation_post_process): PTQFakeQuantize(
        fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([1.], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
        (activation_post_process): HistogramObserver(min_val=inf, max_val=-inf)
      )
    )
    (f_mul_alpha): FloatFunctional(
      (activation_post_process): PTQFakeQuantize(
        fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0001], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
        (activation_post_process): HistogramObserver(min_val=0.0, max_val=0.12268836051225662)
      )
    )
    (f_add): FloatFunctional(
      (activation_post_process): PTQFakeQuantize(
        fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0048], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
        (activation_post_process): HistogramObserver(min_val=2.958098876959525e-09, max_val=4.043459415435791)
      )
    )
    (quant): QuantStub(
      (activation_post_process): PTQFakeQuantize(
        fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0001], device='cuda:0'), zero_point=tensor([255], device='cuda:0', dtype=torch.int32)
        (activation_post_process): HistogramObserver(min_val=-0.02744886465370655, max_val=-0.02744886465370655)
      )
    )
  )
  (body_3_ffn_project_out): Conv2d(
    64, 32, kernel_size=(1, 1), stride=(1, 1)
    (weight_fake_quant): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=-128, quant_max=127, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric, ch_axis=-1, scale=tensor([0.0052], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
      (activation_post_process): MinMaxObserver(min_val=-0.553494393825531, max_val=0.6569843292236328)
    )
    (activation_post_process): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0051], device='cuda:0'), zero_point=tensor([148], device='cuda:0', dtype=torch.int32)
      (activation_post_process): HistogramObserver(min_val=-2.0227982997894287, max_val=2.459625244140625)
    )
  )
  (output): Conv2d(
    32, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)
    (weight_fake_quant): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=-128, quant_max=127, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric, ch_axis=-1, scale=tensor([0.0029], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
      (activation_post_process): MinMaxObserver(min_val=-0.23968760669231415, max_val=0.3721259534358978)
    )
    (activation_post_process): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0004], device='cuda:0'), zero_point=tensor([154], device='cuda:0', dtype=torch.int32)
      (activation_post_process): HistogramObserver(min_val=-0.38361242413520813, max_val=0.10394330322742462)
    )
  )
  (fake_dequant_0): DeQuantStub()
  (float_functional_simple_0): FloatFunctional(
    (activation_post_process): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0043], device='cuda:0'), zero_point=tensor([5], device='cuda:0', dtype=torch.int32)
      (activation_post_process): HistogramObserver(min_val=-0.060858067125082016, max_val=1.065263271331787)
    )
  )
  (float_functional_simple_1): FloatFunctional(
    (activation_post_process): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([1.], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
      (activation_post_process): HistogramObserver(min_val=inf, max_val=-inf)
    )
  )
  (float_functional_simple_2): FloatFunctional(
    (activation_post_process): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0142], device='cuda:0'), zero_point=tensor([139], device='cuda:0', dtype=torch.int32)
      (activation_post_process): HistogramObserver(min_val=-5.174468040466309, max_val=2.951368808746338)
    )
  )
  (float_functional_simple_3): FloatFunctional(
    (activation_post_process): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([1.], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
      (activation_post_process): HistogramObserver(min_val=inf, max_val=-inf)
    )
  )
  (float_functional_simple_4): FloatFunctional(
    (activation_post_process): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0064], device='cuda:0'), zero_point=tensor([135], device='cuda:0', dtype=torch.int32)
      (activation_post_process): HistogramObserver(min_val=-2.5720856189727783, max_val=2.035804271697998)
    )
  )
  (float_functional_simple_5): FloatFunctional(
    (activation_post_process): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([1.], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
      (activation_post_process): HistogramObserver(min_val=inf, max_val=-inf)
    )
  )
  (float_functional_simple_6): FloatFunctional(
    (activation_post_process): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0061], device='cuda:0'), zero_point=tensor([122], device='cuda:0', dtype=torch.int32)
      (activation_post_process): HistogramObserver(min_val=-2.3314638137817383, max_val=1.8853096961975098)
    )
  )
  (float_functional_simple_7): FloatFunctional(
    (activation_post_process): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([1.], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
      (activation_post_process): HistogramObserver(min_val=inf, max_val=-inf)
    )
  )
  (float_functional_simple_8): FloatFunctional(
    (activation_post_process): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0016], device='cuda:0'), zero_point=tensor([133], device='cuda:0', dtype=torch.int32)
      (activation_post_process): HistogramObserver(min_val=-0.6245776414871216, max_val=0.5727238655090332)
    )
  )
  (float_functional_simple_9): FloatFunctional(
    (activation_post_process): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0040], device='cuda:0'), zero_point=tensor([2], device='cuda:0', dtype=torch.int32)
      (activation_post_process): HistogramObserver(min_val=-0.06914715468883514, max_val=1.1036951541900635)
    )
  )
)

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions