Skip to content

Commit 52e1f5e

Browse files
Fix the mirror Issue and update the Docs.
1 parent 437cb87 commit 52e1f5e

File tree

14 files changed

+436
-249
lines changed

14 files changed

+436
-249
lines changed

docs/api_reference/quantization.rst

Lines changed: 98 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,107 @@
1-
Easy Quantization with `QuantizerFactory`
2-
=========================================
1+
# QuantLLM: Advanced Model Quantization
2+
===================================
33

4-
The recommended way to quantize models with QuantLLM is by using the `QuantizerFactory.quantize_from_pretrained` static method. This high-level API simplifies the process of loading a model from Hugging Face, quantizing it using a specified method, and receiving the quantized model along with its tokenizer.
4+
💫 Introduction
5+
------------
6+
QuantLLM is a powerful library that provides state-of-the-art quantization methods to compress large language models while maintaining their performance. Supporting multiple quantization methods (AWQ, GPTQ, GGUF), it enables efficient model deployment in production environments.
7+
8+
🚀 Getting Started
9+
---------------
10+
QuantLLM offers multiple quantization methods, each optimized for different use cases. The high-level `QuantLLM` API provides a simple interface to quantize models while the low-level API gives you fine-grained control over the quantization process.
11+
12+
Key Features:
13+
- Multiple quantization methods (AWQ, GPTQ, GGUF)
14+
- Memory-efficient processing
15+
- Hardware-specific optimizations
16+
- Comprehensive metrics and logging
17+
- Easy model export and deployment
18+
19+
Complete Example
20+
---------------
521

622
.. code-block:: python
723
824
import torch
9-
from quantllm import QuantizerFactory # Assuming __init__.py is updated
25+
from quantllm import QuantLLM
26+
from transformers import AutoTokenizer
27+
import time
28+
29+
# 1. Model and Method Selection
30+
model_name = "facebook/opt-125m" # Any Hugging Face model
31+
method = "awq" # Choose: 'awq', 'gptq', or 'gguf'
32+
33+
# 2. Configure Quantization
34+
quant_config = {
35+
"bits": 4, # Quantization bits (2-8)
36+
"group_size": 128, # Size of quantization groups
37+
"zero_point": True, # Zero-point quantization (AWQ)
38+
"version": "v2", # AWQ algorithm version
39+
"scale_dtype": "fp32", # Scale factor data type
40+
"batch_size": 4 # Processing batch size
41+
}
42+
43+
# 3. Prepare Calibration Data
44+
tokenizer = AutoTokenizer.from_pretrained(model_name)
45+
46+
# Representative text samples for calibration
47+
calibration_texts = [
48+
"Translate English to French: Hello, how are you?",
49+
"Summarize this text: The quick brown fox jumps over the lazy dog",
50+
"What is the capital of France?",
51+
"Write a short story about a robot learning to paint.",
52+
"Explain quantum computing in simple terms."
53+
]
54+
55+
# Tokenize with proper padding and attention masks
56+
inputs = tokenizer(
57+
calibration_texts,
58+
padding=True,
59+
truncation=True,
60+
max_length=512,
61+
return_tensors="pt"
62+
)
1063
11-
# Example: Quantizing facebook/opt-125m using AWQ
12-
model_name = "facebook/opt-125m"
13-
method = "awq" # Can be 'awq', 'gptq', or 'gguf'
64+
# 4. Model Quantization with Error Handling
65+
try:
66+
print("Starting quantization process...")
67+
start_time = time.time()
68+
69+
# Perform quantization
70+
quantized_model, tokenizer = QuantLLM.quantize_from_pretrained(
71+
model_name=model_name,
72+
method=method,
73+
quant_config_dict=quant_config,
74+
calibration_data=inputs["input_ids"],
75+
calibration_steps=50,
76+
device="cuda" if torch.cuda.is_available() else "cpu"
77+
)
78+
79+
print(f"Quantization completed in {time.time() - start_time:.2f} seconds")
80+
81+
# 5. Model Validation
82+
test_input = "Translate this to French: The weather is beautiful today."
83+
inputs = tokenizer(test_input, return_tensors="pt").to(quantized_model.device)
84+
85+
with torch.no_grad():
86+
outputs = quantized_model.generate(
87+
**inputs,
88+
max_length=50,
89+
num_return_sequences=1,
90+
temperature=0.7
91+
)
92+
93+
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
94+
print(f"Test Output: {result}")
95+
96+
# 6. Save Quantized Model (Optional)
97+
save_path = "./quantized_model"
98+
quantized_model.save_pretrained(save_path)
99+
tokenizer.save_pretrained(save_path)
100+
print(f"Model saved to {save_path}")
101+
102+
except Exception as e:
103+
print(f"Error during quantization: {str(e)}")
104+
raise
14105
15106
# Define quantization configuration
16107
quant_config = {

docs/api_reference/trainer.rst

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,9 @@
11
Trainer API
22
==========
33

4+
QuantLLM provides a comprehensive training API with built-in support for quantization,
5+
efficient fine-tuning, and progress tracking.
6+
47
Fine-Tuning Trainer
58
-----------------
69

@@ -28,6 +31,65 @@ Training Logger
2831
Example Usage
2932
-----------
3033

34+
Complete Training Pipeline
35+
~~~~~~~~~~~~~~~~~~~~~~~
36+
37+
.. code-block:: python
38+
39+
from quantllm import (
40+
Model, ModelConfig,
41+
FineTuningTrainer, TrainingConfig,
42+
TrainingLogger, CheckpointManager
43+
)
44+
45+
# Initialize logger for beautiful progress display
46+
logger = TrainingLogger()
47+
48+
# Configure model with advanced optimizations
49+
config = ModelConfig(
50+
model_name="facebook/opt-125m",
51+
load_in_4bit=True, # Memory efficient!
52+
use_lora=True, # Parameter efficient!
53+
gradient_checkpointing=True # Training efficient!
54+
)
55+
56+
# Initialize training with rich features
57+
training_config = TrainingConfig(
58+
learning_rate=2e-4,
59+
num_epochs=3,
60+
batch_size=8,
61+
gradient_accumulation_steps=4,
62+
# Advanced features
63+
warmup_ratio=0.1,
64+
evaluation_strategy="steps",
65+
eval_steps=100,
66+
save_strategy="epoch",
67+
logging_steps=10,
68+
# Mixed precision training
69+
fp16=True,
70+
# Multi-GPU support
71+
ddp_find_unused_parameters=False
72+
)
73+
74+
# Setup checkpointing
75+
checkpoint_manager = CheckpointManager(
76+
checkpoint_dir="./checkpoints",
77+
save_total_limit=3
78+
)
79+
80+
# Initialize and train
81+
trainer = FineTuningTrainer(
82+
model=model,
83+
training_config=training_config,
84+
train_dataloader=train_loader,
85+
eval_dataloader=val_loader,
86+
logger=logger,
87+
checkpoint_manager=checkpoint_manager
88+
)
89+
90+
# Start training with full monitoring
91+
trainer.train()
92+
3193
Basic Training
3294
~~~~~~~~~~~~
3395

docs/getting_started.rst

Lines changed: 51 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -4,48 +4,85 @@ Getting Started
44
Quick Start
55
----------
66

7-
This guide will help you get started with QuantLLM quickly. Here's a simple example of fine-tuning a language model:
7+
QuantLLM is designed to make working with large language models more accessible and efficient. Here's a complete example showcasing its key features:
88

99
.. code-block:: python
1010
1111
from quantllm import (
1212
Model, ModelConfig,
1313
LoadDataset, DatasetConfig,
14-
FineTuningTrainer, TrainingConfig
14+
FineTuningTrainer, TrainingConfig,
15+
TrainingLogger
1516
)
1617
17-
# 1. Load and configure model
18+
# Initialize logger for rich progress tracking
19+
logger = TrainingLogger() # This will display the ASCII art logo!
20+
21+
# 1. Load and configure model with best practices
1822
model_config = ModelConfig(
1923
model_name="facebook/opt-125m",
20-
load_in_4bit=True # Enable 4-bit quantization
24+
load_in_4bit=True, # Enable memory-efficient 4-bit quantization
25+
use_lora=True, # Enable parameter-efficient fine-tuning
26+
gradient_checkpointing=True # Reduce memory usage during training
2127
)
2228
model = Model(model_config).get_model()
2329
24-
# 2. Load and prepare dataset
30+
# 2. Load and prepare dataset with automatic preprocessing
2531
dataset = LoadDataset().load_hf_dataset("imdb")
32+
dataset_config = DatasetConfig(
33+
text_column="text",
34+
label_column="label",
35+
max_length=512
36+
)
2637
27-
# 3. Configure training
38+
# 3. Configure training with optimized defaults
2839
training_config = TrainingConfig(
2940
learning_rate=2e-4,
3041
num_epochs=3,
31-
batch_size=8
42+
batch_size=8,
43+
gradient_accumulation_steps=4, # For larger effective batch sizes
44+
warmup_ratio=0.1, # Gradual learning rate warmup
45+
evaluation_strategy="steps", # Regular evaluation during training
46+
eval_steps=100
3247
)
3348
34-
# 4. Train model
49+
# 4. Initialize trainer with progress tracking
3550
trainer = FineTuningTrainer(
3651
model=model,
37-
training_config=training_config
52+
training_config=training_config,
53+
logger=logger # Enable rich progress tracking
3854
)
55+
56+
# 5. Start training with automatic hardware optimization
3957
trainer.train()
4058
4159
Core Features
4260
------------
4361

44-
* **Efficient Quantization**: 4-bit and 8-bit quantization support
45-
* **Hardware Optimization**: Automatic hardware detection and optimization
46-
* **LoRA Integration**: Parameter-efficient fine-tuning
47-
* **Progress Tracking**: Rich logging and visualization
48-
* **Easy Deployment**: Simple export and deployment options
62+
* **Advanced Quantization**
63+
* 4-bit and 8-bit quantization for up to 75% memory reduction
64+
* Automatic format selection based on your hardware
65+
* Zero-shot quantization with minimal accuracy loss
66+
67+
* **Efficient Fine-tuning**
68+
* LoRA support for parameter-efficient training
69+
* Gradient checkpointing for reduced memory usage
70+
* Automatic mixed precision training
71+
72+
* **Hardware Optimization**
73+
* Automatic hardware detection (CUDA, MPS, CPU)
74+
* Optimal settings for your specific GPU
75+
* CPU offloading for large models
76+
77+
* **Rich Progress Tracking**
78+
* Beautiful terminal-based progress display
79+
* Detailed training metrics and logs
80+
* Integration with WandB and TensorBoard
81+
82+
* **Production Ready**
83+
* Simple export to ONNX and TorchScript
84+
* Quantized model deployment
85+
* GPU and CPU inference optimization
4986

5087
Key Concepts
5188
-----------

quantllm/__init__.py

Lines changed: 31 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,23 @@
1111
TrainingLogger
1212
)
1313
from .hub import HubManager, CheckpointManager
14-
from .utils.optimizations import get_optimal_training_settings
15-
from .utils.log_config import configure_logging, enable_logging
14+
from .utils import (
15+
get_optimal_training_settings,
16+
configure_logging,
17+
enable_logging,
18+
QuantizationBenchmark
19+
)
20+
from .api import QuantLLM
21+
22+
from .quant import (
23+
QuantizationConfig,
24+
QuantizationEngine,
25+
QuantizedLinear,
26+
GGUFQuantizer,
27+
GPTQQuantizer,
28+
AWQQuantizer
29+
)
30+
1631

1732
from .config import (
1833
ModelConfig,
@@ -53,17 +68,26 @@
5368
"ModelConfig",
5469
"DatasetConfig",
5570
"TrainingConfig",
71+
"QuantizationBenchmark",
5672

5773
# Utilities
5874
"get_optimal_training_settings",
5975
"configure_logging",
6076
"enable_logging",
77+
78+
# Quantization
79+
"QuantizationConfig",
80+
"QuantizationEngine",
81+
"QuantizedLinear",
82+
"GGUFQuantizer",
83+
"GPTQQuantizer",
84+
"AWQQuantizer",
85+
86+
# API
87+
"QuantLLM"
6188
]
6289

90+
6391
# Initialize package-level logger with fancy welcome message
6492
logger = TrainingLogger()
65-
logger.log_success(f"""
66-
✨ QuantLLM v{__version__} initialized successfully ✨
67-
🚀 Efficient Quantized Language Model Fine-Tuning
68-
📚 Documentation: https://github.com/codewithdark-git/QuantLLM
69-
""")
93+
logger.log_welcome_message()

0 commit comments

Comments
 (0)