-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Description
I’m testing INT8 quantization using ASP 2:4 sparse model.
-
Case A: Export sparse FP32 ONNX without QuantizeLinear/DequantizeLinear(Q/DQ) and build INT8 model using Polygraphy (with sparsity enabled).
-> TensorRT layer info indicates sparsity-enabled tactics/kernels are selected for some layers. -
Case B: Starting from the same sparse model, export an ONNX with Q/DQ using pytorch-quantization and build with: trtexec --int8 --sparsity=enable
-> Layer info still shows HasSparseWeights=1 for some layers, but sparsity-enabled tactics/kernels do not appear to be selected.
Question:
Is this difference between no Q/DQ ONNX vs Q/DQ ONNX expected for 2:4 sparsity?
If not, are there recommended export/build settings for Q/DQ ONNX to enable sparsity tactics ?
Environment: Jetson (JetPack 6.1.2 / TensorRT 8.6.1)