-
Notifications
You must be signed in to change notification settings - Fork 3k
Make LCPAN also pass the activator to DepthwiseSeparable #9280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
Without this, nn.Hardswish() will be selected in ConvBNLayer, which causes the opset version to be bumped to 14, in turn breaking a ESP-DL onnx quantization, which only supports 13. I would also suspect that this is more correct also in other cases as one would not be likely to not want to use a specific activation function everywhere, would one supply it. Also, it aligns with how the act-parameter is handled in other cases.
Thanks for your contribution! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes effectively allow passing the configured activation function. Without them, setting act=relu6
in LCPAN still results in a hardswish activation function in the exported ONNX.
I successfully exported an ONNX network with opset 11 for ESP-DL (which does not support hardswish activation and opset 14) using these changes.
@LokeZhou @jzhang533 |
@LokeZhou @jzhang533 isn't this an obvious bug? I have to correct this each time i use a new version of PaddleDetection? |
I agree |
@zhangyubo0722 , @SigureMo , can you please review this? @simoberny Sorry, accidentally requested a review from you. |
Without this, nn.Hardswish() will be selected in ConvBNLayer, which causes the opset version to be bumped to 14, in turn, and in my case, breaks ESP-DL ONNX quantization, which only supports 13.
I would also suspect that this is more correct also in other cases as one would not be likely to not want to use a specific activation function everywhere, would one supply it.
Also, it aligns with how the act-parameter is handled in other cases.
(accidentally first did the PR against a release branch)