I used the command
nlp-train transformer_glue \
--task_name mrpc \
--model_name_or_path bert-base-uncased \
--model_type quant_bert \
--learning_rate 2e-5 \
--output_dir /tmp/mrpc-8bit \
--evaluate_during_training \
--data_dir /path/to/MRPC \
--do_lower_case
to training the model and
nlp-inference transformer_glue \
--model_path /tmp/mrpc-8bit \
--task_name mrpc \
--model_type quant_bert \
--output_dir /tmp/mrpc-8bit \
--data_dir /path/to/MRPC \
--do_lower_case \
--overwrite_output_dir \
--load_quantized_model
to do inference,but got the same performance as no flag --load_quantized_model.How could I improve the inference performance?