-
Notifications
You must be signed in to change notification settings - Fork 615
Description
System Info
LAMP stack, Debian 10
Python 3.10
pip using OrtModel, onnx, transformer.. , and latest upgraded versions (as at 19 march 2025)
pip install --upgrade optimum transformers
pip install --upgrade huggingface huggingface-hub;
Who can help?
Hi Josh
Actual behaviour:
The encoder_model_quantized is cached correctly using "encoder_file_name"
BUT "file_name" caches the original full sized decoder_model_merged.onnx , instead of the version I call "decoder_model_quantized "
This behaviour mimics fallback, because if I completely remove "file_name" parameter it also demonstrates the same behaviour.
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
Issue:
When running model directly from repo, I use the following settings and expect to cache and use the named 'quantised' versions of the model.
eg.
config = AutoConfig.from_pretrained(f'{path}')
model = ORTModelForSeq2SeqLM.from_pretrained(
"Xenova/opus-mt-en-de",
config=config,
subfolder="onnx",
encoder_file_name="encoder_model_quantized.onnx",
file_name="decoder_model_quantized.onnx",
accelerator="ort",
)
I have also tested with decoder_file_name="decoder_model_quantized.onnx",
Expected behavior
Expected behaviour - alternatives:
-
when explicitly stating the name of the models to be used in the file_name and encoder_file_name parameters, these are the files cached and used from repo
-
when only explicitly stating the name of the encoder to use: encoder_file_name="encoder_model_quantized.onnx", that the FALLBACK should be the corresponding decoder model.. ie. decoder_model_quantized.onnx. or
encoder_file_name="encoder_model_fp16.onnx", that the FALLBACK should be the corresponding decoder model.. ie. decoder_model_merged_fp16.onnx.
Apologies if I've overlooked something the docs or other issues, but I've checked both using github and google site:github.com searches and bupkis.