Skip to content

using QDQonnx export engine file,fp16and int8,speed is not faster than using ONNX withouout QDQ to export fp16 engine #4554

@DDDaar

Description

@DDDaar

Hello! I used mtq.quantize to quantize the RFDETR model and exported the QDQ ONNX. However, after converting it to an engine, the models exported with --int8 and --fp16, although smaller in size than the original fp16 engine file, have almost the same inference speed. Could you please suggest any solutions? Thank you.

Metadata

Metadata

Assignees

Labels

Module:PerformanceGeneral performance issuestriagedIssue has been triaged by maintainers

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions