Skip to content

TensorRT 2:4 sparsity is not applied with Q/DQ quantization #4694

@uamsam

Description

@uamsam

I’m testing INT8 quantization using ASP 2:4 sparse model.

  • Case A: Export sparse FP32 ONNX without QuantizeLinear/DequantizeLinear(Q/DQ) and build INT8 model using Polygraphy (with sparsity enabled).
    -> TensorRT layer info indicates sparsity-enabled tactics/kernels are selected for some layers.

  • Case B: Starting from the same sparse model, export an ONNX with Q/DQ using pytorch-quantization and build with: trtexec --int8 --sparsity=enable
    -> Layer info still shows HasSparseWeights=1 for some layers, but sparsity-enabled tactics/kernels do not appear to be selected.

Question:
Is this difference between no Q/DQ ONNX vs Q/DQ ONNX expected for 2:4 sparsity?
If not, are there recommended export/build settings for Q/DQ ONNX to enable sparsity tactics ?

Environment: Jetson (JetPack 6.1.2 / TensorRT 8.6.1)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions