Converting a .nb model to onnx #17377

ATotheD99 · 2025-12-17T10:35:16Z

ATotheD99
Dec 17, 2025

Hello everyone,

Due to runtime issues, I need to use a PPOCRv5 model as a quantised variant. In order to minimise the loss of accuracy caused by quantisation, I would like to perform quant-aware training.

If I understand correctly, this will result in an .nb model. Is there a way to convert my int8 model to the normal PaddleOCR format, or somehow to the onnx format?

Or are there other ways to significantly speed up the runtime of my PPOCR models?

I would be very grateful for a reply. Thanks

hjiehudbandbegshth · 2025-12-18T03:44:09Z

hjiehudbandbegshth
Dec 18, 2025

Hi, as far as I understand, the .nb file is a runtime package generated by the Paddle-Lite opt tool. It bundles the model structure and weights into a single file and also includes device-specific optimizations. Once you only have the .nb file, there’s no way to convert it back to a .pdmodel/.pdiparams pair or directly to ONNX. So before running the Paddle-Lite optimizer, it’s important to keep the original Paddle inference model (either dynamic or static).

As for exporting an INT8 QAT model to a standard Paddle inference model, after finishing quant-aware training (QAT) with PaddleSlim, you should still have a Python training script that looks roughly like this:

from paddleslim.quant import QuantAware
...
quant_aware_model = QuantAware(config=quant_config).quant_aware(train_model)
# run QAT training ...
# export the model for inference
paddle.jit.to_static(quant_aware_model)
paddle.jit.save(quant_aware_model, "./inference_int8/quant_infer_model")

This will generate two files: quant_infer_model.pdmodel and quant_infer_model.pdiparams. Make sure to keep both of them.

If you want to convert the saved INT8 inference model to ONNX, once you have the .pdmodel/.pdiparams files, you can install the official paddle2onnx tool:

pip install paddle2onnx

Then you can run something like:

paddle2onnx \
  --model_dir ./inference_int8 \
  --model_filename quant_infer_model.pdmodel \
  --params_filename quant_infer_model.pdiparams \
  --save_file ./inference_int8/quant_infer_model.onnx \
  --opset_version 11


Hope this helps, and please feel free to correct me if I missed anything.

3 replies

ATotheD99 Dec 18, 2025
Author

Thank you very much for your reply.

I use the script from the documentation for the QAT.
So this:
python deploy/slim/quantization/quant.py

Can I then add these lines of code at the end:
# export the model for inference paddle.jit.to_static(quant_aware_model) paddle.jit.save(quant_aware_model, './inference_int8/quant_infer_model')

The problem I'm currently having, which is also mentioned in the documentation, is this:

The numerical range of the quantised model parameters derived from the above steps is still FP32, but the numerical range of the parameters is int8.

So they are still FP32.
And I don't want to transform them into PaddleLite either.

hjiehudbandbegshth Dec 19, 2025

Thank you for your reply.
Paddle’s QAT is mainly designed to reduce the accuracy drop when the model is later deployed with a true INT8 backend (such as Paddle-Lite, TensorRT, or other runtimes that support INT8 execution), rather than to directly produce a model that can be executed in INT8 by Paddle Inference itself.

Therefore, if you do not plan to use Paddle-Lite or another INT8-capable runtime, QAT by itself will not bring a noticeable inference speedup;

From a workflow perspective, if you are using deploy/slim/quantization/quant.py, it should in principle be possible to export a standard Paddle inference model by calling paddle.jit.to_static and paddle.jit.save on the QAT-trained model at the end of the script, provided that the final quant_aware_model is accessible there.

I hope this helps clarify the behavior. Please let me know if this is helpful.

ATotheD99 Dec 27, 2025
Author

Thanks for the explanation.

Just to clarify one point: when you say that it should be possible to export the model after QAT, what exactly do you mean by saving the INT8 model?

Do you mean:

exporting a standard Paddle inference model via paddle.jit.to_static / paddle.jit.save that still stores parameters as FP32 but contains the QAT (fake-quant) ops,
or

exporting a model where the weights are already physically stored as INT8 values?

I just want to make sure I understand the intended workflow correctly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Converting a .nb model to onnx #17377

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Converting a .nb model to onnx #17377

Uh oh!

ATotheD99 Dec 17, 2025

Replies: 1 comment · 3 replies

Uh oh!

Uh oh!

hjiehudbandbegshth Dec 18, 2025

Uh oh!

ATotheD99 Dec 18, 2025 Author

Uh oh!

hjiehudbandbegshth Dec 19, 2025

Uh oh!

ATotheD99 Dec 27, 2025 Author

ATotheD99
Dec 17, 2025

Replies: 1 comment 3 replies

hjiehudbandbegshth
Dec 18, 2025

ATotheD99 Dec 18, 2025
Author

ATotheD99 Dec 27, 2025
Author