Differentiable “Self-Compression” as an Optional Training-Time Feature #3810

stktyagi · 2025-12-24T08:02:16Z

stktyagi
Dec 24, 2025

I wanted to float an idea for an experimental training-time compression feature that could sit alongside existing PTQ and QAT workflows in NNCF.

The core idea is self-compression: instead of manually configuring mixed precision, sparsity schedules, or multi-stage compression pipelines, the model learns its own optimal bit-widths and channel usage during training via gradients.

What this adds (at a high level)

Learnable bit-widths
Introduce a differentiable quantizer where bit-depth ($b$) is a trainable parameter. This gives users an automated alternative to hand-crafted mixed-precision setups.
Single unified compression objective
Add a SelfCompressionLoss term that penalizes total network bit-count. This naturally pushes the optimizer toward both quantization and pruning in one pass.
A modern alternative to deprecated structural pruning
Channel-level elimination happens implicitly through gradients rather than explicit pruning schedules, which feels more aligned with current training practices.
Size-targeted optimization
Users can control aggressiveness via a single size/memory penalty ($\gamma$), letting the model discover the best weight-to-bit tradeoff on its own.
Minimal disruption to existing workflows
This could be opt-in, experimental, and designed to coexist cleanly with PTQ and QAT.

Implementation-wise, this could live as a new DifferentiableQuantizer and a corresponding CompressionAlgorithm.

From a user perspective, this becomes a more “set-and-forget” option:

Define a compression penalty $\to$ train normally $\to$ let the model converge to its most efficient form without multi-stage schedules or manual tuning.

Does this sound like something that could be worked on?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Differentiable “Self-Compression” as an Optional Training-Time Feature #3810

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Differentiable “Self-Compression” as an Optional Training-Time Feature #3810

Uh oh!

stktyagi Dec 24, 2025

What this adds (at a high level)

Replies: 0 comments

stktyagi
Dec 24, 2025