Replies: 2 comments
-
Hi @yufenglee, could you please help take a look? |
Beta Was this translation helpful? Give feedback.
-
Facing the same issue Example 1:Input = ['best hotel in bay area']
Example 2:Input = ['best hotel in bay area',"best hotel in bay"]
The output from the quantized model changes when batch size > 1
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Describe the bug
I generated a int8 quantized gpt2 model by following the instruction in this notebook (https://github.com/microsoft/onnxruntime/blob/1ce2982f65e5516067fdcaef19409279173b0d75/onnxruntime/python/tools/transformers/notebooks/Inference_GPT2_with_OnnxRuntime_on_CPU.ipynb)
When I test the quantized model with batch input, some unexpected results were found.
These are two input I tested:
['best hotel in bay area', 'best hotel in bay area'] ( [[13466, 7541, 287, 15489, 1989], [13466, 7541, 287, 15489, 1989]] after tokenize)
['best hotel in bayale', 'best hotel in bay area'] ( [[13466, 7541, 287, 15489, 1000], [13466, 7541, 287, 15489, 1989]] after tokenize)
sample 2 in both batch input are the same, I just changed the last token of sample 1.
after inputing them into quantized gpt2 model, the output of sample 2 are different...
I was expecting same output results from sample 2 in these two input tests.
System information
To Reproduce
I tried with two input
and debug the result by checking
after the first step, this 'next_token_logits' output of second sample in test 1 and test 2 are different.
I also check if only one sample input
the result is the same with test 1, but different with test 2.
Expected behavior
I was expecting the same output of the second sample in test 1 and test 2, as their input are the same.
Additional context
In addition, I also try this script with the onnx gpt2 model with quantization, the results are as expected:
the 'next_token_logits' output of second sample of test 1 and test 2 are the same.
Beta Was this translation helpful? Give feedback.
All reactions