The LittleBit Project

Sub-1-Bit LLM Compression via Latent Factorization

Official implementation of LittleBit (NeurIPS 2025) and LittleBit-2 (ICML 2026).

Papers

LittleBit-2: Maximizing the Spectral Energy Gain in Sub-1-Bit LLMs via Latent Geometry Alignment (ICML 2026)
Banseok Lee, Youngmin Kim

LittleBit: Ultra Low-Bit Quantization via Latent Factorization (NeurIPS 2025)
Banseok Lee*, Dongkyu Kim*, Youngcheon You, Youngmin Kim

Abstract

LittleBit compresses large language models into the sub-1-bit regime by factorizing each dense weight matrix into low-rank latent factors, binarizing those factors, and restoring magnitude information through lightweight learned scales. This enables extreme compression, including the 0.1 bits-per-weight setting, while preserving the original model architecture at inference time.

LittleBit-2 improves this recipe by addressing latent geometry misalignment in the initialization stage. It applies Internal Latent Rotation with Joint Iterative Quantization (Joint-ITQ), aligning the SVD-derived latent factors with the binary hypercube before QAT. LittleBit-2 initialization is available as an opt-in (--use_itq) and produces no additional inference overhead.

Highlights

Sub-1-bit compression: Designed for 1.0 to 0.1 bits per weight.
LittleBit-2 opt-in: Enable Joint-ITQ initialization with --use_itq for improved latent geometry alignment.
No inference-time change: LittleBit-2 modifies initialization only; the deployed factorized layer remains the same.
QAT-friendly: Supports Quantization-Aware Training with SmoothSign and optional residual factorization.

Supported Models

The codebase currently supports:

OPT
Llama and Llama 2/3
Phi-4
Qwen2.5 and QwQ
Gemma 2 and Gemma 3
Qwen3

Installation

We recommend Python 3.12.

conda create -n littlebit python=3.12
conda activate littlebit

# Install CUDA toolkit. Adjust the CUDA version if needed.
conda install nvidia/label/cuda-12.4.1::cuda-toolkit -c nvidia/label/cuda-12.4.1

# Install PyTorch.
pip install torch==2.8.0+cu124 torchvision==0.23.0+cu124 torchaudio==2.8.0+cu124 --index-url https://download.pytorch.org/whl/cu124

# Install dependencies.
pip install -r requirements.txt

Important

For reproducing the paper results, use transformers 4.51.x. Newer transformers releases may change model internals or evaluation behavior.

pip install "transformers==4.51.*"

Usage

Training

Train a model with Quantization-Aware Training. By default, LittleBitLinear uses the original SVD-only initialization. To enable LittleBit-2 (Joint-ITQ), pass --use_itq True.

Single GPU

CUDA_VISIBLE_DEVICES=0 python -m main \
    --model_id meta-llama/Llama-2-7b-hf \
    --dataset c4_wiki \
    --save_dir ./outputs/Llama-2-7b-LittleBit-2 \
    --num_train_epochs 5.0 \
    --per_device_train_batch_size 4 \
    --lr 4e-05 \
    --warmup_ratio 0.02 \
    --report wandb \
    --quant_func SmoothSign \
    --quant_mod LittleBitLinear \
    --residual True \
    --eff_bit 1.0 \
    --kv_factor 1.0 \
    --min_split_dim 8 \
    --l2l_loss_scale 10.0

# Opt-in to LittleBit-2 initialization
# --use_itq True

Multi-GPU with DeepSpeed

deepspeed --num_gpus=4 main.py \
    --model_id meta-llama/Llama-2-7b-hf \
    --dataset c4_wiki \
    --save_dir ./outputs/Llama-2-7b-LittleBit-2 \
    --ds_config_path configs/zero3.json \
    --num_train_epochs 5.0 \
    --per_device_train_batch_size 4 \
    --lr 4e-05 \
    --report wandb \
    --quant_func SmoothSign \
    --quant_mod LittleBitLinear \
    --residual True \
    --eff_bit 1.0 \
    --kv_factor 1.0 \
    --min_split_dim 8

Evaluation

Evaluate a local checkpoint or a model hosted on the Hugging Face Hub.

# From a local directory
CUDA_VISIBLE_DEVICES=0 python eval.py \
    --model_id ./outputs/Llama-2-7b-LittleBit-2 \
    --seqlen 2048 \
    --ppl_task wikitext2,c4 \
    --zeroshot_task boolq,piqa,hellaswag,winogrande,arc_easy,arc_challenge,openbookqa

# From the Hugging Face Hub
CUDA_VISIBLE_DEVICES=0 python eval.py \
    --model_id username/littlebit-llama-7b-0.1bpw \
    --seqlen 2048 \
    --ppl_task wikitext2

Legacy Checkpoints

Older checkpoints may not include littlebit_config.json. In that case, pass the quantization arguments explicitly:

CUDA_VISIBLE_DEVICES=0 python eval.py \
    --model_id ./outputs/Legacy-Llama-2-7b \
    --quant_func SmoothSign \
    --quant_mod LittleBitLinear \
    --split_dim 1024

Parameter loading priority:

Explicit CLI arguments
littlebit_config.json in the model directory
config.json fallback for older checkpoints

Citation

If you find this work useful, please cite:

@inproceedings{lee2026littlebit2,
  title={LittleBit-2: Maximizing the Spectral Energy Gain in Sub-1-Bit LLMs via Latent Geometry Alignment},
  author={Lee, Banseok and Kim, Youngmin},
  booktitle={Proceedings of the 43rd International Conference on Machine Learning},
  year={2026}
}

@inproceedings{lee2025littlebit,
  title={LittleBit: Ultra Low-Bit Quantization via Latent Factorization},
  author={Lee, Banseok and Kim, Dongkyu and You, Youngcheon and Kim, Youngmin},
  booktitle={Advances in Neural Information Processing Systems},
  year={2025}
}

License

This project is licensed under the CC BY-NC 4.0 license.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
configs		configs
quantization		quantization
utils		utils
.gitignore		.gitignore
.isort.cfg		.isort.cfg
.style.yapf		.style.yapf
LICENSE		LICENSE
README.md		README.md
eval.py		eval.py
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The LittleBit Project

Sub-1-Bit LLM Compression via Latent Factorization

Papers

Abstract

Highlights

Supported Models

Installation

Usage

Training

Evaluation

Legacy Checkpoints

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

The LittleBit Project

Sub-1-Bit LLM Compression via Latent Factorization

Papers

Abstract

Highlights

Supported Models

Installation

Usage

Training

Evaluation

Legacy Checkpoints

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages