-
Notifications
You must be signed in to change notification settings - Fork 132
Open
Labels
questionFurther information is requestedFurther information is requested
Description
I followed the https://github.com/alibaba/TinyNeuralNetwork/blob/main/examples/quantization/post_error_anaylsis.py
calibrated with 10 iterations on test dataloader.
But my network PTQ outputs really bad results as below.
why the following layers have really low cosine similarity?
float_functional_simple_8 cosine: 0.4407, scale: 0.0016, zero_point: 133
output cosine: 0.6237, scale: 0.0004, zero_point: 154
and found the result image have values of multiple of 0.004 (maybe this is because 8 bit?)
WARNING (tinynn.util.quantization_analysis_util) Quantization error report:
Activations (cosine sorted ):
fake_quant_0 cosine: 1.0000, scale: 0.0043, zero_point: 5
float_functional_simple_0 cosine: 1.0000, scale: 0.0043, zero_point: 5
patch_embed_proj cosine: 0.9980, scale: 0.0133, zero_point: 125
body_0_ffn_project_in cosine: 0.9858, scale: 0.0177, zero_point: 126
body_0_ffn_dwconv cosine: 0.9823, scale: 0.0127, zero_point: 124
body_0_ffn_act.quant cosine: 1.0000, scale: 0.0005, zero_point: 255
body_0_ffn_act.f_mul_alpha cosine: 0.9720, scale: 0.0008, zero_point: 0
body_0_ffn_act.f_add cosine: 0.9801, scale: 0.0051, zero_point: 0
body_0_ffn_project_out cosine: 0.9130, scale: 0.0134, zero_point: 128
float_functional_simple_2 cosine: 0.9400, scale: 0.0142, zero_point: 139
body_1_ffn_project_in cosine: 0.9636, scale: 0.0285, zero_point: 123
body_1_ffn_dwconv cosine: 0.9774, scale: 0.0499, zero_point: 188
body_1_ffn_act.quant cosine: 1.0000, scale: 0.0000, zero_point: 255
body_1_ffn_act.f_mul_alpha cosine: 0.9806, scale: 0.0001, zero_point: 0
body_1_ffn_act.f_add cosine: 0.8992, scale: 0.0133, zero_point: 0
body_1_ffn_project_out cosine: 0.9002, scale: 0.0131, zero_point: 127
float_functional_simple_4 cosine: 0.8736, scale: 0.0064, zero_point: 135
body_2_ffn_project_in cosine: 0.9458, scale: 0.0125, zero_point: 130
body_2_ffn_dwconv cosine: 0.9597, scale: 0.0170, zero_point: 177
body_2_ffn_act.quant cosine: 1.0000, scale: 0.0001, zero_point: 0
body_2_ffn_act.f_mul_alpha cosine: 0.9837, scale: 0.0003, zero_point: 238
body_2_ffn_act.f_add cosine: 0.9069, scale: 0.0067, zero_point: 49
body_2_ffn_project_out cosine: 0.9250, scale: 0.0070, zero_point: 122
float_functional_simple_6 cosine: 0.9139, scale: 0.0061, zero_point: 122
body_3_ffn_project_in cosine: 0.9626, scale: 0.0085, zero_point: 126
body_3_ffn_dwconv cosine: 0.9880, scale: 0.0102, zero_point: 147
body_3_ffn_act.quant cosine: 1.0000, scale: 0.0001, zero_point: 255
body_3_ffn_act.f_mul_alpha cosine: 0.9933, scale: 0.0001, zero_point: 0
body_3_ffn_act.f_add cosine: 0.9638, scale: 0.0048, zero_point: 0
body_3_ffn_project_out cosine: 0.9703, scale: 0.0051, zero_point: 148
float_functional_simple_8 cosine: 0.4407, scale: 0.0016, zero_point: 133
output cosine: 0.6237, scale: 0.0004, zero_point: 154
float_functional_simple_9 cosine: 0.9894, scale: 0.0040, zero_point: 2
WARNING (tinynn.util.quantization_analysis_util) Quantization error report:
Weights (cosine sorted 20):
output cosine: 0.9996, scale: 0.0029, zero_point: 0
patch_embed_proj cosine: 0.9998, scale: 0.0056, zero_point: 0
body_3_ffn_dwconv cosine: 0.9999, scale: 0.0113, zero_point: 0
body_1_ffn_dwconv cosine: 0.9999, scale: 0.0114, zero_point: 0
body_0_ffn_dwconv cosine: 0.9999, scale: 0.0132, zero_point: 0
body_2_ffn_dwconv cosine: 0.9999, scale: 0.0110, zero_point: 0
body_0_ffn_project_in cosine: 0.9999, scale: 0.0118, zero_point: 0
body_1_ffn_project_in cosine: 0.9999, scale: 0.0104, zero_point: 0
body_2_ffn_project_in cosine: 0.9999, scale: 0.0094, zero_point: 0
body_3_ffn_project_out cosine: 0.9999, scale: 0.0052, zero_point: 0
body_0_ffn_project_out cosine: 0.9999, scale: 0.0051, zero_point: 0
body_1_ffn_project_out cosine: 0.9999, scale: 0.0058, zero_point: 0
body_3_ffn_project_in cosine: 0.9999, scale: 0.0085, zero_point: 0
body_2_ffn_project_out cosine: 0.9999, scale: 0.0048, zero_point: 0
Activations (cosine sorted 20):
body_1_ffn_act.f_add cosine: 0.9432, scale: 0.0133, zero_point: 0
body_0_ffn_project_in cosine: 0.9546, scale: 0.0177, zero_point: 126
body_0_ffn_act.f_mul_alpha cosine: 0.9757, scale: 0.0008, zero_point: 0
body_0_ffn_dwconv cosine: 0.9781, scale: 0.0127, zero_point: 124
body_1_ffn_project_in cosine: 0.9788, scale: 0.0285, zero_point: 123
body_0_ffn_act.f_add cosine: 0.9803, scale: 0.0051, zero_point: 0
body_1_ffn_dwconv cosine: 0.9900, scale: 0.0499, zero_point: 188
patch_embed_proj cosine: 0.9907, scale: 0.0133, zero_point: 125
body_1_ffn_project_out cosine: 0.9928, scale: 0.0131, zero_point: 127
float_functional_simple_2 cosine: 0.9939, scale: 0.0142, zero_point: 139
body_0_ffn_project_out cosine: 0.9941, scale: 0.0134, zero_point: 128
body_1_ffn_act.f_mul_alpha cosine: 0.9949, scale: 0.0001, zero_point: 0
body_2_ffn_project_out cosine: 0.9991, scale: 0.0070, zero_point: 122
float_functional_simple_8 cosine: 0.9993, scale: 0.0016, zero_point: 133
float_functional_simple_4 cosine: 0.9993, scale: 0.0064, zero_point: 135
body_2_ffn_act.f_add cosine: 0.9994, scale: 0.0067, zero_point: 49
body_3_ffn_project_out cosine: 0.9995, scale: 0.0051, zero_point: 148
body_3_ffn_act.f_add cosine: 0.9995, scale: 0.0048, zero_point: 0
body_2_ffn_project_in cosine: 0.9995, scale: 0.0125, zero_point: 130
float_functional_simple_6 cosine: 0.9996, scale: 0.0061, zero_point: 122
QMLBNR_Video_Images_CleanedImgOut(
(fake_quant_0): QuantStub(
(activation_post_process): PTQFakeQuantize(
fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0043], device='cuda:0'), zero_point=tensor([5], device='cuda:0', dtype=torch.int32)
(activation_post_process): HistogramObserver(min_val=-0.060858067125082016, max_val=1.065263271331787)
)
)
(patch_embed_proj): Conv2d(
34, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(weight_fake_quant): PTQFakeQuantize(
fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=-128, quant_max=127, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric, ch_axis=-1, scale=tensor([0.0056], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
(activation_post_process): MinMaxObserver(min_val=-0.7156392335891724, max_val=0.6727595925331116)
)
(activation_post_process): PTQFakeQuantize(
fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0133], device='cuda:0'), zero_point=tensor([125], device='cuda:0', dtype=torch.int32)
(activation_post_process): HistogramObserver(min_val=-2.7740585803985596, max_val=2.808042287826538)
)
)
(body_0_ffn_project_in): Conv2d(
32, 64, kernel_size=(1, 1), stride=(1, 1)
(weight_fake_quant): PTQFakeQuantize(
fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=-128, quant_max=127, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric, ch_axis=-1, scale=tensor([0.0118], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
(activation_post_process): MinMaxObserver(min_val=-1.50102961063385, max_val=1.4929399490356445)
)
(activation_post_process): PTQFakeQuantize(
fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0177], device='cuda:0'), zero_point=tensor([126], device='cuda:0', dtype=torch.int32)
(activation_post_process): HistogramObserver(min_val=-8.30683708190918, max_val=9.019135475158691)
)
)
(body_0_ffn_dwconv): Conv2d(
64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=64
(weight_fake_quant): PTQFakeQuantize(
fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=-128, quant_max=127, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric, ch_axis=-1, scale=tensor([0.0132], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
(activation_post_process): MinMaxObserver(min_val=-1.6797765493392944, max_val=1.2519268989562988)
)
(activation_post_process): PTQFakeQuantize(
fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0127], device='cuda:0'), zero_point=tensor([124], device='cuda:0', dtype=torch.int32)
(activation_post_process): HistogramObserver(min_val=-8.81155776977539, max_val=8.465405464172363)
)
)
(body_0_ffn_act): QPReLU(
(relu1): ReLU()
(relu2): ReLU()
(f_mul_neg_one1): FloatFunctional(
(activation_post_process): PTQFakeQuantize(
fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([1.], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
(activation_post_process): HistogramObserver(min_val=inf, max_val=-inf)
)
)
(f_mul_neg_one2): FloatFunctional(
(activation_post_process): PTQFakeQuantize(
fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([1.], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
(activation_post_process): HistogramObserver(min_val=inf, max_val=-inf)
)
)
(f_mul_alpha): FloatFunctional(
(activation_post_process): PTQFakeQuantize(
fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0008], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
(activation_post_process): HistogramObserver(min_val=0.0, max_val=1.1990134716033936)
)
)
(f_add): FloatFunctional(
(activation_post_process): PTQFakeQuantize(
fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0051], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
(activation_post_process): HistogramObserver(min_val=1.9476141185914564e-11, max_val=8.480807304382324)
)
)
(quant): QuantStub(
(activation_post_process): PTQFakeQuantize(
fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0005], device='cuda:0'), zero_point=tensor([255], device='cuda:0', dtype=torch.int32)
(activation_post_process): HistogramObserver(min_val=-0.1360728144645691, max_val=-0.1360728144645691)
)
)
)
(body_0_ffn_project_out): Conv2d(
64, 32, kernel_size=(1, 1), stride=(1, 1)
(weight_fake_quant): PTQFakeQuantize(
fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=-128, quant_max=127, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric, ch_axis=-1, scale=tensor([0.0051], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
(activation_post_process): MinMaxObserver(min_val=-0.656369149684906, max_val=0.5680031180381775)
)
(activation_post_process): PTQFakeQuantize(
fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0134], device='cuda:0'), zero_point=tensor([128], device='cuda:0', dtype=torch.int32)
(activation_post_process): HistogramObserver(min_val=-4.129014492034912, max_val=2.9563019275665283)
)
)
(body_1_ffn_project_in): Conv2d(
32, 64, kernel_size=(1, 1), stride=(1, 1)
(weight_fake_quant): PTQFakeQuantize(
fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=-128, quant_max=127, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric, ch_axis=-1, scale=tensor([0.0104], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
(activation_post_process): MinMaxObserver(min_val=-1.1867985725402832, max_val=1.3303239345550537)
)
(activation_post_process): PTQFakeQuantize(
fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0285], device='cuda:0'), zero_point=tensor([123], device='cuda:0', dtype=torch.int32)
(activation_post_process): HistogramObserver(min_val=-10.604576110839844, max_val=11.774846076965332)
)
)
(body_1_ffn_dwconv): Conv2d(
64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=64
(weight_fake_quant): PTQFakeQuantize(
fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=-128, quant_max=127, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric, ch_axis=-1, scale=tensor([0.0114], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
(activation_post_process): MinMaxObserver(min_val=-1.218837857246399, max_val=1.4485722780227661)
)
(activation_post_process): PTQFakeQuantize(
fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0499], device='cuda:0'), zero_point=tensor([188], device='cuda:0', dtype=torch.int32)
(activation_post_process): HistogramObserver(min_val=-26.407068252563477, max_val=9.284073829650879)
)
)
(body_1_ffn_act): QPReLU(
(relu1): ReLU()
(relu2): ReLU()
(f_mul_neg_one1): FloatFunctional(
(activation_post_process): PTQFakeQuantize(
fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([1.], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
(activation_post_process): HistogramObserver(min_val=inf, max_val=-inf)
)
)
(f_mul_neg_one2): FloatFunctional(
(activation_post_process): PTQFakeQuantize(
fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([1.], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
(activation_post_process): HistogramObserver(min_val=inf, max_val=-inf)
)
)
(f_mul_alpha): FloatFunctional(
(activation_post_process): PTQFakeQuantize(
fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([6.1041e-05], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
(activation_post_process): HistogramObserver(min_val=0.0, max_val=0.04620003327727318)
)
)
(f_add): FloatFunctional(
(activation_post_process): PTQFakeQuantize(
fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0133], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
(activation_post_process): HistogramObserver(min_val=8.125244184073455e-13, max_val=9.228955268859863)
)
)
(quant): QuantStub(
(activation_post_process): PTQFakeQuantize(
fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([6.8477e-06], device='cuda:0'), zero_point=tensor([255], device='cuda:0', dtype=torch.int32)
(activation_post_process): HistogramObserver(min_val=-0.0017461760435253382, max_val=-0.0017461760435253382)
)
)
)
(body_1_ffn_project_out): Conv2d(
64, 32, kernel_size=(1, 1), stride=(1, 1)
(weight_fake_quant): PTQFakeQuantize(
fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=-128, quant_max=127, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric, ch_axis=-1, scale=tensor([0.0058], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
(activation_post_process): MinMaxObserver(min_val=-0.6410632133483887, max_val=0.7345181703567505)
)
(activation_post_process): PTQFakeQuantize(
fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0131], device='cuda:0'), zero_point=tensor([127], device='cuda:0', dtype=torch.int32)
(activation_post_process): HistogramObserver(min_val=-2.9626619815826416, max_val=3.1261534690856934)
)
)
(body_2_ffn_project_in): Conv2d(
32, 64, kernel_size=(1, 1), stride=(1, 1)
(weight_fake_quant): PTQFakeQuantize(
fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=-128, quant_max=127, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric, ch_axis=-1, scale=tensor([0.0094], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
(activation_post_process): MinMaxObserver(min_val=-0.9506298303604126, max_val=1.1930553913116455)
)
(activation_post_process): PTQFakeQuantize(
fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0125], device='cuda:0'), zero_point=tensor([130], device='cuda:0', dtype=torch.int32)
(activation_post_process): HistogramObserver(min_val=-4.432260036468506, max_val=4.595865249633789)
)
)
(body_2_ffn_dwconv): Conv2d(
64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=64
(weight_fake_quant): PTQFakeQuantize(
fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=-128, quant_max=127, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric, ch_axis=-1, scale=tensor([0.0110], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
(activation_post_process): MinMaxObserver(min_val=-1.1248000860214233, max_val=1.408692479133606)
)
(activation_post_process): PTQFakeQuantize(
fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0170], device='cuda:0'), zero_point=tensor([177], device='cuda:0', dtype=torch.int32)
(activation_post_process): HistogramObserver(min_val=-12.326212882995605, max_val=4.741044044494629)
)
)
(body_2_ffn_act): QPReLU(
(relu1): ReLU()
(relu2): ReLU()
(f_mul_neg_one1): FloatFunctional(
(activation_post_process): PTQFakeQuantize(
fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([1.], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
(activation_post_process): HistogramObserver(min_val=inf, max_val=-inf)
)
)
(f_mul_neg_one2): FloatFunctional(
(activation_post_process): PTQFakeQuantize(
fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([1.], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
(activation_post_process): HistogramObserver(min_val=inf, max_val=-inf)
)
)
(f_mul_alpha): FloatFunctional(
(activation_post_process): PTQFakeQuantize(
fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0003], device='cuda:0'), zero_point=tensor([238], device='cuda:0', dtype=torch.int32)
(activation_post_process): HistogramObserver(min_val=-0.32712748646736145, max_val=0.005777135491371155)
)
)
(f_add): FloatFunctional(
(activation_post_process): PTQFakeQuantize(
fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0067], device='cuda:0'), zero_point=tensor([49], device='cuda:0', dtype=torch.int32)
(activation_post_process): HistogramObserver(min_val=-0.32712748646736145, max_val=4.730205535888672)
)
)
(quant): QuantStub(
(activation_post_process): PTQFakeQuantize(
fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0001], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
(activation_post_process): HistogramObserver(min_val=0.02653917297720909, max_val=0.02653917297720909)
)
)
)
(body_2_ffn_project_out): Conv2d(
64, 32, kernel_size=(1, 1), stride=(1, 1)
(weight_fake_quant): PTQFakeQuantize(
fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=-128, quant_max=127, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric, ch_axis=-1, scale=tensor([0.0048], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
(activation_post_process): MinMaxObserver(min_val=-0.6111413240432739, max_val=0.6124895811080933)
)
(activation_post_process): PTQFakeQuantize(
fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0070], device='cuda:0'), zero_point=tensor([122], device='cuda:0', dtype=torch.int32)
(activation_post_process): HistogramObserver(min_val=-2.243239164352417, max_val=2.098046064376831)
)
)
(body_3_ffn_project_in): Conv2d(
32, 64, kernel_size=(1, 1), stride=(1, 1)
(weight_fake_quant): PTQFakeQuantize(
fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=-128, quant_max=127, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric, ch_axis=-1, scale=tensor([0.0085], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
(activation_post_process): MinMaxObserver(min_val=-0.9145596027374268, max_val=1.082226276397705)
)
(activation_post_process): PTQFakeQuantize(
fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0085], device='cuda:0'), zero_point=tensor([126], device='cuda:0', dtype=torch.int32)
(activation_post_process): HistogramObserver(min_val=-4.47300910949707, max_val=3.173271656036377)
)
)
(body_3_ffn_dwconv): Conv2d(
64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=64
(weight_fake_quant): PTQFakeQuantize(
fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=-128, quant_max=127, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric, ch_axis=-1, scale=tensor([0.0113], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
(activation_post_process): MinMaxObserver(min_val=-1.4376935958862305, max_val=1.413638710975647)
)
(activation_post_process): PTQFakeQuantize(
fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0102], device='cuda:0'), zero_point=tensor([147], device='cuda:0', dtype=torch.int32)
(activation_post_process): HistogramObserver(min_val=-4.438266754150391, max_val=4.107792377471924)
)
)
(body_3_ffn_act): QPReLU(
(relu1): ReLU()
(relu2): ReLU()
(f_mul_neg_one1): FloatFunctional(
(activation_post_process): PTQFakeQuantize(
fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([1.], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
(activation_post_process): HistogramObserver(min_val=inf, max_val=-inf)
)
)
(f_mul_neg_one2): FloatFunctional(
(activation_post_process): PTQFakeQuantize(
fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([1.], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
(activation_post_process): HistogramObserver(min_val=inf, max_val=-inf)
)
)
(f_mul_alpha): FloatFunctional(
(activation_post_process): PTQFakeQuantize(
fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0001], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
(activation_post_process): HistogramObserver(min_val=0.0, max_val=0.12268836051225662)
)
)
(f_add): FloatFunctional(
(activation_post_process): PTQFakeQuantize(
fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0048], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
(activation_post_process): HistogramObserver(min_val=2.958098876959525e-09, max_val=4.043459415435791)
)
)
(quant): QuantStub(
(activation_post_process): PTQFakeQuantize(
fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0001], device='cuda:0'), zero_point=tensor([255], device='cuda:0', dtype=torch.int32)
(activation_post_process): HistogramObserver(min_val=-0.02744886465370655, max_val=-0.02744886465370655)
)
)
)
(body_3_ffn_project_out): Conv2d(
64, 32, kernel_size=(1, 1), stride=(1, 1)
(weight_fake_quant): PTQFakeQuantize(
fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=-128, quant_max=127, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric, ch_axis=-1, scale=tensor([0.0052], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
(activation_post_process): MinMaxObserver(min_val=-0.553494393825531, max_val=0.6569843292236328)
)
(activation_post_process): PTQFakeQuantize(
fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0051], device='cuda:0'), zero_point=tensor([148], device='cuda:0', dtype=torch.int32)
(activation_post_process): HistogramObserver(min_val=-2.0227982997894287, max_val=2.459625244140625)
)
)
(output): Conv2d(
32, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)
(weight_fake_quant): PTQFakeQuantize(
fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=-128, quant_max=127, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric, ch_axis=-1, scale=tensor([0.0029], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
(activation_post_process): MinMaxObserver(min_val=-0.23968760669231415, max_val=0.3721259534358978)
)
(activation_post_process): PTQFakeQuantize(
fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0004], device='cuda:0'), zero_point=tensor([154], device='cuda:0', dtype=torch.int32)
(activation_post_process): HistogramObserver(min_val=-0.38361242413520813, max_val=0.10394330322742462)
)
)
(fake_dequant_0): DeQuantStub()
(float_functional_simple_0): FloatFunctional(
(activation_post_process): PTQFakeQuantize(
fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0043], device='cuda:0'), zero_point=tensor([5], device='cuda:0', dtype=torch.int32)
(activation_post_process): HistogramObserver(min_val=-0.060858067125082016, max_val=1.065263271331787)
)
)
(float_functional_simple_1): FloatFunctional(
(activation_post_process): PTQFakeQuantize(
fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([1.], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
(activation_post_process): HistogramObserver(min_val=inf, max_val=-inf)
)
)
(float_functional_simple_2): FloatFunctional(
(activation_post_process): PTQFakeQuantize(
fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0142], device='cuda:0'), zero_point=tensor([139], device='cuda:0', dtype=torch.int32)
(activation_post_process): HistogramObserver(min_val=-5.174468040466309, max_val=2.951368808746338)
)
)
(float_functional_simple_3): FloatFunctional(
(activation_post_process): PTQFakeQuantize(
fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([1.], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
(activation_post_process): HistogramObserver(min_val=inf, max_val=-inf)
)
)
(float_functional_simple_4): FloatFunctional(
(activation_post_process): PTQFakeQuantize(
fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0064], device='cuda:0'), zero_point=tensor([135], device='cuda:0', dtype=torch.int32)
(activation_post_process): HistogramObserver(min_val=-2.5720856189727783, max_val=2.035804271697998)
)
)
(float_functional_simple_5): FloatFunctional(
(activation_post_process): PTQFakeQuantize(
fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([1.], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
(activation_post_process): HistogramObserver(min_val=inf, max_val=-inf)
)
)
(float_functional_simple_6): FloatFunctional(
(activation_post_process): PTQFakeQuantize(
fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0061], device='cuda:0'), zero_point=tensor([122], device='cuda:0', dtype=torch.int32)
(activation_post_process): HistogramObserver(min_val=-2.3314638137817383, max_val=1.8853096961975098)
)
)
(float_functional_simple_7): FloatFunctional(
(activation_post_process): PTQFakeQuantize(
fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([1.], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
(activation_post_process): HistogramObserver(min_val=inf, max_val=-inf)
)
)
(float_functional_simple_8): FloatFunctional(
(activation_post_process): PTQFakeQuantize(
fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0016], device='cuda:0'), zero_point=tensor([133], device='cuda:0', dtype=torch.int32)
(activation_post_process): HistogramObserver(min_val=-0.6245776414871216, max_val=0.5727238655090332)
)
)
(float_functional_simple_9): FloatFunctional(
(activation_post_process): PTQFakeQuantize(
fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0040], device='cuda:0'), zero_point=tensor([2], device='cuda:0', dtype=torch.int32)
(activation_post_process): HistogramObserver(min_val=-0.06914715468883514, max_val=1.1036951541900635)
)
)
)
Metadata
Metadata
Assignees
Labels
questionFurther information is requestedFurther information is requested