v2.0.0
New Features
- Support for peft 0.10.x version, with the default value of the
tuner_backendparameter changed topeft. The interface of peft has been dynamically patched to support parameters likelora_dtype. - Support for vllm+lora inference.
- Refactored and updated the README file.
- Added English versions of the documentation. Currently, all documents have both English and Chinese versions.
- Support for training 70B models using FSDP+QLoRA on dual 24GB GPUs. Script available at: https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/llama2_70b_chat/qlora_fsdp/sft.sh
- Support for training agents and using the ModelScopeAgent framework. Documentation available at: https://github.com/modelscope/swift/blob/main/docs/source/LLM/Agent%E5%BE%AE%E8%B0%83%E6%9C%80%E4%BD%B3%E5%AE%9E%E8%B7%B5.md
- Support for model evaluation and benchmark. Documentation available at: https://github.com/modelscope/swift/blob/main/docs/source/LLM/LLM%E8%AF%84%E6%B5%8B%E6%96%87%E6%A1%A3.md
- Support for multi-task experiment management. Documentation available at: https://github.com/modelscope/swift/blob/main/docs/source/LLM/LLM%E5%AE%9E%E9%AA%8C%E6%96%87%E6%A1%A3.md
- Support for GaLore training.
- Support for training and inference of AQLM and AWQ quantized models.
New Models
- MAMBA series models. Script available at: https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/mamba-1.4b/lora/sft.sh
- DeepSeek VL series models. Documentation available at: https://github.com/modelscope/swift/blob/main/docs/source_en/Multi-Modal/deepseek-vl-best-practice.md
- LLAVA series models. Documentation available at: https://github.com/modelscope/swift/blob/main/docs/source/Multi-Modal/llava%E6%9C%80%E4%BD%B3%E5%AE%9E%E8%B7%B5.md
- TeleChat models. Script available at: https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/telechat_12b/lora/sft.sh
- Grok-1 models. Documentation available at: https://github.com/modelscope/swift/blob/main/docs/source_en/LLM/Grok-1-best-practice.md
- Qwen 1.5 MoE series models for training and inference.
- dbrx models for training and inference. Script available at: https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/dbrx-instruct/lora_mp/sft.sh
- Mengzi3 models for training and inference. Script available at: https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/mengzi3_13b_base/lora_ddp_ds/sft.sh
- Xverse MoE models for training and inference. Script available at: https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/xverse_moe_a4_2b/lora/sft.sh
- c4ai-command-r series models for training and inference.
- MiniCPM series models for training and inference. Script available at: https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/minicpm_moe_8x2b/lora_ddp/sft.sh
- Mixtral-8x22B-v0.1 models for training and inference. Script available at: https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/mixtral_moe_8x22b_v1/lora_ddp_ds/sft.sh
New Datasets
- Support for the
Ruozhibadataset: https://github.com/modelscope/swift/blob/main/docs/source_en/LLM/Supported-models-datasets.md
What's Changed
- Fix RsLoRA by @tastelikefeet in #567
- Fix yi-vl merge lora by @Jintao-Huang in #568
- Add doc for tuner module by @tastelikefeet in #571
- update agent documentation by @tastelikefeet in #572
- Update agent doc to fix some conflicts by @tastelikefeet in #573
- support vllm lora by @Jintao-Huang in #565
- Support llava by @Jintao-Huang in #577
- fix app-ui max_length is None by @Jintao-Huang in #580
- support
train_dataset_mix_dsusing custom_local_path by @Jintao-Huang in #582 - Fix LRScheduler by @tastelikefeet in #586
- compat with transformers==4.39 by @Jintao-Huang in #584
- Fix weight saving by @tastelikefeet in #589
- fix mix_dataset_sample float by @Jintao-Huang in #594
- Refactor all docs by @tastelikefeet in #599
- fix tiny bugs in docs by @tastelikefeet in #600
- fix issue template and add a pr one by @tastelikefeet in #601
- Fix/security template by @tastelikefeet in #603
- update docs by @Jintao-Huang in #604
- support Mistral-7b-v0.2 by @hjh0119 in #605
- fix deploy safe_response by @Jintao-Huang in #614
- Fix Adalora with devicemap by @tastelikefeet in #619
- update ui by @tastelikefeet in #621
- support TeleChat-12b by @hjh0119 in #607
- fix save dir (additional_files) by @Jintao-Huang in #622
- fix Telechat model by @hjh0119 in #623
- Add Grok model by @tastelikefeet in #629
- add missing files by @tastelikefeet in #631
- support qwen1.5-moe model by @hjh0119 in #627
- support Telechat-7b model by @hjh0119 in #630
- support model Dbrx by @hjh0119 in #643
- fix ui by @tastelikefeet in #648
- fix typing hint by @Jintao-Huang in #649
- support Mengzi-13b-base model by @hjh0119 in #646
- support Qwen1.5-32b models by @hjh0119 in #655
- fix plot error by @tastelikefeet in #651
- Support FSDP + QLoRA by @tastelikefeet in #659
- move fsdp config path by @tastelikefeet in #662
- change the default value of ddp_backend by @tastelikefeet in #667
- fix ui log by @tastelikefeet in #669
- support Xverse-MoE model by @hjh0119 in #668
- Support longlora for transformers 4.38 by @tastelikefeet in #456
- add ruozhiba datasets by @tastelikefeet in #670
- compatible with old versions of modelscope by @tastelikefeet in #671
- Fix data_collator by @tastelikefeet in #674
- [TorchAcc][Experimental] Integrate TorchAcc. by @baoleai in #647
- update Agent best practice with Modelscope-Agent by @hjh0119 in #676
- support c4ai-command-r model by @hjh0119 in #684
- Support Eval by @tastelikefeet in #494
- fix anchor by @tastelikefeet in #687
- Fix/0412 by @tastelikefeet in #690
- support minicpm and mixtral-moe model by @hjh0119 in #692
- fix device_map 4 (qwen-vl) by @Jintao-Huang in #695
- fix multimodal model image_mode = 'CMYK' (fix issue#677) by @Jintao-Huang in #697
- feat(model): support minicpm-v-2(#699 ) by @YuzaChongyi in #699
New Contributors
- @hjh0119 made their first contribution in #605
- @YuzaChongyi made their first contribution in #699
Full Changelog: v1.7.3...v2.0.0