Skip to content

duybaohuynhtan/Quantization-in-Depth

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Quantization in Depth

DeepLearning.AI Hugging Face

This repository contains materials and code examples from the DeepLearning.AI short course "Quantization in Depth", instructed by Marc Sun and Younes Belkada from Hugging Face.

📚 Course Overview

"Quantization in Depth" teaches advanced techniques to compress neural network models, reducing their size to a fraction of the original while maintaining performance. By implementing customized quantization methods from scratch, you'll gain deep insights into the tradeoffs between model size and accuracy, enabling faster inference and broader deployment of AI models.

🎯 What You'll Learn

  • ⚙️ Implement and compare different variants of linear quantization (symmetric vs. asymmetric mode)
  • 🔍 Apply varying granularity levels: per-tensor, per-channel, and per-group quantization
  • 🛠️ Build a general-purpose quantizer in PyTorch that can compress any open source model's dense layers by up to 4x
  • 📦 Implement weights packing techniques to compress weights from 32 bits to as low as 2 bits

🗂️ Course Structure

This course includes 18 lessons with 13 code examples:

Fundamentals of Quantization

  1. 🧠 Introduction - Overview of quantization importance and applications
  2. 📖 Overview - Core concepts and techniques covered in the course

Building Quantization from Scratch

  1. 🔢 Quantize and De-quantize a Tensor - Fundamental operations in quantization
  2. 📏 Get the Scale and Zero Point - Understanding key quantization parameters
  3. ⚖️ Symmetric vs Asymmetric Mode - Comparing different quantization approaches
  4. 🎯 Finer Granularity for more Precision - Introduction to granularity concepts

Advanced Granularity Techniques

  1. 📊 Per Channel Quantization - Implementing channel-wise quantization
  2. 🧩 Per Group Quantization - Implementing group-wise quantization
  3. 🚀 Quantizing Weights & Activations for Inference - Practical application to inference

Building a Complete Quantizer

  1. 🛠️ Custom Build an 8-Bit Quantizer - Developing a custom quantization solution
  2. 🔄 Replace PyTorch layers with Quantized Layers - Practical integration with PyTorch
  3. 🌐 Quantize any Open Source PyTorch Model - Building a general-purpose solution
  4. 🤝 Load your Quantized Weights from HuggingFace Hub - Working with the Hugging Face ecosystem

Ultra-Low Bit Quantization

  1. 📦 Weights Packing - Theory behind extreme compression
  2. 🧮 Packing 2-bit Weights - Implementation of 2-bit weight compression
  3. 🔓 Unpacking 2-Bit Weights - Recovering usable weights from compressed format

Looking Forward

  1. 🚧 Beyond Linear Quantization - Introduction to advanced quantization methods
  2. 🏁 Conclusion - Summary and future directions

💻 Code Examples

This repository contains 13 code examples that correspond to the course lessons:

  1. 🔢 Tensor Quantization - Basic quantize/dequantize operations
  2. 📏 Computing Scale and Zero Point - Determining quantization parameters
  3. ⚖️ Symmetric vs. Asymmetric Modes - Implementing both quantization modes
  4. 🎯 Per-Tensor Quantization - Basic granularity implementation
  5. 📊 Per-Channel Quantization - Channel-wise implementation
  6. 🧩 Per-Group Quantization - Group-wise implementation
  7. 🚀 Weights & Activations Quantization - Full inference-ready quantization
  8. 🛠️ 8-Bit Quantizer Implementation - Complete 8-bit solution
  9. 🔄 Quantized Layer Replacement - Integration with PyTorch
  10. 🌐 General-Purpose Model Quantizer - Quantizing any PyTorch model
  11. 🤝 Hugging Face Integration - Loading and saving quantized weights
  12. 📦 2-Bit Weight Packing - Ultra-low bit compression implementation
  13. 🔓 2-Bit Weight Unpacking - Efficient decompression implementation

🚀 Getting Started

Prerequisites

# Clone the repository
git clone https://github.com/duybaohuynhtan/Quantization-in-Depth.git
cd Quantization-in-Depth

# Install required packages
pip install -r requirements.txt

Running the Examples

Each code example is presented as a Jupyter notebook:

jupyter notebook

Navigate to the notebooks/ directory and open the desired example.

📋 Requirements

  • 🐍 Python 3.9.18
  • ⚡ Accelerate 0.26.1
  • 📊 Seaborn 0.13.1
  • 🔥 Torch 2.1.1
  • 🤗 Transformers 4.35.0

🔗 Additional Resources

👨‍🏫 Instructors

  • Marc Sun - Machine Learning Engineer at Hugging Face
  • Younes Belkada - Machine Learning Engineer at Hugging Face

🙏 Acknowledgments

Special thanks to DeepLearning.AI and Hugging Face for creating such comprehensive learning materials.

About

This repository contains materials and code examples from the DeepLearning.AI short course "Quantization in Depth", instructed by Marc Sun and Younes Belkada from Hugging Face.

Resources

Stars

Watchers

Forks

Contributors