Quantization model recall drop around 5%. #18665

Rfank2021 · 2023-12-01T13:42:46Z

Rfank2021
Dec 1, 2023

I have a pytorch roberta base model for a classification task, and I use onnxruntime to quantization, I found precision is almost same, but recall drop around 5%.

My code is similar to https://github.com/huggingface/notebooks/blob/main/examples/onnx-export.ipynb, and I also saw https://medium.com/microsoftazure/faster-and-smaller-quantized-nlp-with-hugging-face-and-onnx-runtime-ec5525473bb7 saying that f1 score should be similar to the original model.

I tried onnxruntime.quantization.qdq_loss_debug. create_weight_matching, but it outputs an empty dict. I believe I already tried most of the config during quantization, is there a way or tool to determine which nodes to exclude during quantization?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Quantization model recall drop around 5%. #18665

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Quantization model recall drop around 5%. #18665

Uh oh!

Rfank2021 Dec 1, 2023

Replies: 0 comments

Rfank2021
Dec 1, 2023