You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I wanted to float an idea for an experimental training-time compression feature that could sit alongside existing PTQ and QAT workflows in NNCF.
The core idea is self-compression: instead of manually configuring mixed precision, sparsity schedules, or multi-stage compression pipelines, the model learns its own optimal bit-widths and channel usage during training via gradients.
What this adds (at a high level)
Learnable bit-widths
Introduce a differentiable quantizer where bit-depth ($b$) is a trainable parameter. This gives users an automated alternative to hand-crafted mixed-precision setups.
Single unified compression objective
Add a SelfCompressionLoss term that penalizes total network bit-count. This naturally pushes the optimizer toward both quantization and pruning in one pass.
A modern alternative to deprecated structural pruning
Channel-level elimination happens implicitly through gradients rather than explicit pruning schedules, which feels more aligned with current training practices.
Size-targeted optimization
Users can control aggressiveness via a single size/memory penalty ($\gamma$), letting the model discover the best weight-to-bit tradeoff on its own.
Minimal disruption to existing workflows
This could be opt-in, experimental, and designed to coexist cleanly with PTQ and QAT.
Implementation-wise, this could live as a new DifferentiableQuantizer and a corresponding CompressionAlgorithm.
From a user perspective, this becomes a more “set-and-forget” option:
Define a compression penalty $\to$ train normally $\to$ let the model converge to its most efficient form without multi-stage schedules or manual tuning.
Does this sound like something that could be worked on?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
I wanted to float an idea for an experimental training-time compression feature that could sit alongside existing PTQ and QAT workflows in NNCF.
The core idea is self-compression: instead of manually configuring mixed precision, sparsity schedules, or multi-stage compression pipelines, the model learns its own optimal bit-widths and channel usage during training via gradients.
What this adds (at a high level)
Learnable bit-widths$b$ ) is a trainable parameter. This gives users an automated alternative to hand-crafted mixed-precision setups.
Introduce a differentiable quantizer where bit-depth (
Single unified compression objective
Add a
SelfCompressionLossterm that penalizes total network bit-count. This naturally pushes the optimizer toward both quantization and pruning in one pass.A modern alternative to deprecated structural pruning
Channel-level elimination happens implicitly through gradients rather than explicit pruning schedules, which feels more aligned with current training practices.
Size-targeted optimization$\gamma$ ), letting the model discover the best weight-to-bit tradeoff on its own.
Users can control aggressiveness via a single size/memory penalty (
Minimal disruption to existing workflows
This could be opt-in, experimental, and designed to coexist cleanly with PTQ and QAT.
Implementation-wise, this could live as a new
DifferentiableQuantizerand a correspondingCompressionAlgorithm.From a user perspective, this becomes a more “set-and-forget” option:
Does this sound like something that could be worked on?
Beta Was this translation helpful? Give feedback.
All reactions