Skip to content

PaddleFormers v1.0

Latest

Choose a tag to compare

@lugimzzz lugimzzz released this 21 Jan 08:38
· 193 commits to develop since this release
e6632f6

PaddleFormers 1.0 marks a major milestone release, delivering a unified, high-performance training system with broader model coverage, VLM capabilities, and deep hardware ecosystem support.

✨ New Features

1. Deep Integration with PaddleFleet

PaddleFormers is deeply integrated with PaddleFleet, PaddlePaddle’s general-purpose high-performance distributed training engine.
With a highly abstracted and modular design, PaddleFormers provides a unified training acceleration framework for mainstream large model architectures. Optimized training strategies can be efficiently reused and migrated across models with minimal effort, enabling systematic and scalable performance advantages.

2. VLM Capability Support

PaddleFormers 1.0 introduces enhanced vision-language model (VLM) support, including:

  • Unified VLM data preprocessing via AutoProcessor
  • CLI-based SFT training with LoRA fine-tuning support for VLMs

These capabilities significantly simplify VLM training and fine-tuning workflows.

3. New Model Support

This release adds support for the following models:

4. FlexCheckpoint as the Default Checkpoint Mechanism

FlexCheckpoint is now fully adopted across all supported models in PaddleFormers.
Model loading and saving default to FlexCheckpoint, enabling flexible and robust checkpoint management under diverse distributed training strategies while maintaining compatibility with Hugging Face–format weights.

5. Multiple AI Chip Support

PaddleFormers expands support for domestic AI hardware platforms, including Kunlunxin P800, ILUVATAR BI150, and MetaX C550.
The ERNIE-4.5 series, PaddleOCR-VL, and DeepSeek-V3 models are fully supported on these platforms (see the table below for details).

Model Kunlunxin P800 ILUVATAR BI150 MetaX C550
PaddleOCR-VL
ERNIE-4.5
DeepSeek-V3

PaddleFormers also enables full-parameter fine-tuning of DeepSeek-V3-671B on 128 Kunlunxin P800 cards, delivering one of the most resource-efficient full-parameter fine-tuning solutions on domestic hardware.