Hello, I read the Q8Bert paper and have tried to reproduce the experiment results.
But, on some GLUE tasks ( e.g cola, mrpc ), the differences between the fp32 results and quantized ones are much larger than the differences reported in the paper.
I tried sweeping initial learning rate but still the result was still far from the reported results.

So, I want to ask you if the experiment on Q8bert was done with default parameters set inside nlp-architect code as below.

If not, could you tell me the experiment setting.