Questions about conversion of Hugging Face transformer to ONNX #7051
Unanswered
Matthieu-Tinycoaching
asked this question in
Other Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi community,
I have tried the
convert_graph_to_onnx.py
script (https://huggingface.co/transformers/serialization.html) to convert one transformer model from PyTorch to ONNX format. I have a few questions :I have installed
onnxrutime-gpu
. Does the model generated with the script will be functionning only with GPU or will it work also with CPU onnx runtime ? So, do I have to generate one onnx model per device?Does the ONNX model dependant of the hardware it bas been generated from or do I have to generate the ONNX model on the target hardware where will be run the inference ?
Are the outputs of the ONNX model identical wherever hardware the inference is run on? So, can I use the embeddings generated from the ONNX model but from different hardware platforms?
How can I apply quantization on ONNX model for both CPU and GPU devices ?
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions