Releases: PaddlePaddle/PaddleFormers
PaddleFormers v0.3
PaddleFormers 0.3 is officially released! This release introduces several key features and improvements:
✨ New Features
1. Hugging Face safetensor weight loading & saving
PaddleFormers now supports loading and saving Hugging Face safetensor model weights.
from paddleformers.transformers import AutoTokenizer, AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen3-0.6B-Base",
convert_from_hf=True
)
model.save_pretrained("Qwen/Qwen3-0.6B-Base-new", save_to_hf=True)2. New model support
Added support for the following models:
- qwen2
- qwen3
- qwen2moe
- qwen3moe
- ernie4_5
- ernie4_5_moe
- gpt_oss
3. Generalized large model modules (paddleformers/nn)
Introduced a generalized module library for large models to reduce the cost of integrating distributed training.
Includes:
- Attention
- Embedding
- Pipeline parallel model
- Normalization
- MLP
- LM Head
- Linear
You can check out the implementation details here:
https://github.com/PaddlePaddle/PaddleFormers/tree/develop/paddleformers/nn
PaddleFormers v0.2
PaddleFormers 0.2 is officially released! This release introduces several key features and improvements:
✨ New Features
1. Multi-source Model Download
- Added support for downloading models from HuggingFace Hub, ModelScope, and AI Studio, making model access more flexible and convenient.
2. HuggingFace Tokenizer Compatibility
- PaddleFormers now wraps and supports HuggingFace Tokenizer, allowing users to directly leverage the HuggingFace tokenizer ecosystem while keeping the PaddleFormers experience consistent.
3. Lazy Import Optimization
- Introduced lazy import mechanism, enabling the Tokenizer module to be used independently without requiring Paddle installation.
- This makes it easier to use Tokenizer in lightweight scenarios, such as preprocessing or pure inference, while improving modularity and usability.
PaddleFormers v0.1
PaddleFormers 0.1 is officially released! This initial version supports SFT/DPO training paradigms, configurable distributed training via unified Trainer API, and integrates PEFT, MergeKit, and Quantization APIs for diverse LLM applications.
Highlights
⚙️ Simplified Distributed Training
Implements 4D parallel strategies through unified Trainer API, lowering the barrier to distributed LLM training.
🛠 Efficient Post-Training
Integrates Packing dataflow and FlashMask operators for SFT/DPO training, eliminating padding waste and boosting throughput.
💾 Industrial Storage Solution
Features Unified Checkpoint storage tools for LLMs, enabling training resumption and dynamic resource scaling. Additionally implements asynchronous storage (up to 95% faster) and Optimizer State Quantization (78% storage reduction), ensuring industrial training meets both efficiency and stability requirements.