Questions about conversion of Hugging Face transformer to ONNX #7051

Matthieu-Tinycoaching · 2021-03-18T07:56:26Z

Matthieu-Tinycoaching
Mar 18, 2021

Hi community,

I have tried the convert_graph_to_onnx.py script (https://huggingface.co/transformers/serialization.html) to convert one transformer model from PyTorch to ONNX format. I have a few questions :

I have installed onnxrutime-gpu. Does the model generated with the script will be functionning only with GPU or will it work also with CPU onnx runtime ? So, do I have to generate one onnx model per device?
Does the ONNX model dependant of the hardware it bas been generated from or do I have to generate the ONNX model on the target hardware where will be run the inference ?
Are the outputs of the ONNX model identical wherever hardware the inference is run on? So, can I use the embeddings generated from the ONNX model but from different hardware platforms?
How can I apply quantization on ONNX model for both CPU and GPU devices ?

Thanks!