modelscope
diff --git a/‎README.md‎
Lines changed: 8 additions & 1 deletion b/‎README.md‎
Lines changed: 8 additions & 1 deletion
diff --git a/‎README_CN.md‎
Lines changed: 8 additions & 2 deletions b/‎README_CN.md‎
Lines changed: 8 additions & 2 deletions
diff --git a/‎docs/source/LLM/LLM微调文档.md‎
Lines changed: 2 additions & 2 deletions b/‎docs/source/LLM/LLM微调文档.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/source/LLM/LLM推理文档.md‎
Lines changed: 5 additions & 4 deletions b/‎docs/source/LLM/LLM推理文档.md‎
Lines changed: 5 additions & 4 deletions
diff --git a/‎docs/source/LLM/LmDeploy推理加速与部署.md‎
Lines changed: 4 additions & 2 deletions b/‎docs/source/LLM/LmDeploy推理加速与部署.md‎
Lines changed: 4 additions & 2 deletions
diff --git a/‎docs/source/LLM/Qwen1.5全流程最佳实践.md‎
Lines changed: 6 additions & 5 deletions b/‎docs/source/LLM/Qwen1.5全流程最佳实践.md‎
Lines changed: 6 additions & 5 deletions
diff --git a/‎docs/source/LLM/VLLM推理加速与部署.md‎
Lines changed: 4 additions & 2 deletions b/‎docs/source/LLM/VLLM推理加速与部署.md‎
Lines changed: 4 additions & 2 deletions
diff --git a/‎docs/source/LLM/index.md‎
Lines changed: 2 additions & 0 deletions b/‎docs/source/LLM/index.md‎
Lines changed: 2 additions & 0 deletions
@@ -55,6 +55,11 @@ You can contact us and communicate with us by adding our group:
 <img src="asset/discord_qr.jpg" width="200" height="200">  |  <img src="asset/wechat.png" width="200" height="200">
 
 ## 🎉 News
+- 🔥2024.08.22: Support `reft` tuner from [ReFT](https://github.com/stanfordnlp/pyreft) to achieve 15×–65× more parameter-efficient than LoRA, use `--sft_type reft` to begin!
+- 2024.08.21: Support for phi3_5-mini-instruct, phi3_5-moe-instruct, and phi3_5-vision-instruct.
+- 2024.08.21: Support for idefics3-8b-llama3, llava-onevision-qwen2-0_5b-ov, llava-onevision-qwen2-7b-ov, and llava-onevision-qwen2-72b-ov.
+- 🔥2024.08.20: Support fine-tuning of multimodal large models using DeepSpeed-Zero3.
+- 2024.08.20: Supported models: longwriter-glm4-9b, longwriter-llama3_1-8b. Supported dataset: longwriter-6k.
 - 🔥2024.08.12: 🎉 SWIFT paper has been published to arXiv. Check [this link](https://arxiv.org/abs/2408.05517) to read.
 - 🔥2024.08.12: Support packing with flash-attention without the contamination of attention_mask, use `--packing` to begin. Check[PR](https://github.com/huggingface/transformers/pull/31629/files).
 - 🔥2024.08.09: Support for inference and fine-tuning of the qwen2-audio model. Best practice can be found [here](https://github.com/modelscope/ms-swift/issues/1653).
@@ -68,6 +73,8 @@ You can contact us and communicate with us by adding our group:
 - 🔥2024.07.24: Support DPO/ORPO/SimPO/CPO alignment algorithm for vision MLLM, training scripts can be find in [Document](docs/source_en/Multi-Modal/human-preference-alignment-training-documentation.md). support RLAIF-V dataset.
 - 🔥2024.07.24: Support using Megatron for CPT and SFT on the Qwen2 series. You can refer to the [Megatron training documentation](docs/source_en/LLM/Megatron-training.md).
 - 🔥2024.07.24: Support for the llama3.1 series models, including 8b, 70b, and 405b. Support for openbuddy-llama3_1-8b-chat.
+<details><summary>More</summary>
+
 - 2024.07.20: Support mistral-nemo series models. Use `--model_type mistral-nemo-base-2407` and `--model_type mistral-nemo-instruct-2407` to begin.
 - 2024.07.19: Support [Q-Galore](https://arxiv.org/abs/2407.08296), this algorithm can reduce the training memory cost by 60% (qwen-7b-chat, full, 80G -> 35G), use `swift sft --model_type xxx --use_galore true --galore_quantization true` to begin!
 - 2024.07.17: Support newly released InternVL2 models: `model_type` are internvl2-1b, internvl2-40b, internvl2-llama3-76b. For best practices, refer to [here](docs/source_en/Multi-Modal/internvl-best-practice.md).
@@ -81,7 +88,6 @@ You can contact us and communicate with us by adding our group:
 - 2024.07.04: Support internlm2_5-7b series: internlm2_5-7b, internlm2_5-7b-chat, internlm2_5-7b-chat-1m.
 - 2024.07.02: Support for `llava1_6-vicuna-7b-instruct`, `llava1_6-vicuna-13b-instruct` and other llava-hf models. For best practices, refer to [here](docs/source_en/Multi-Modal/llava-best-practice.md).
 - 🔥2024.06.29: Support [eval-scope](https://github.com/modelscope/eval-scope)&[open-compass](https://github.com/open-compass/opencompass) for evaluation! Now we have supported over 50 eval datasets like `BoolQ, ocnli, humaneval, math, ceval, mmlu, gsk8k, ARC_e`, please check our [Eval Doc](https://github.com/modelscope/swift/blob/main/docs/source_en/LLM/LLM-eval.md) to begin! Next sprint we will support Multi-modal and Agent evaluation, remember to follow us : )
-<details><summary>More</summary>
 
 - 🔥2024.06.28: Support for **Florence** series model! See [document](docs/source_en/Multi-Modal/florence-best-pratice.md)
 - 🔥2024.06.28: Support for Gemma2 series models: gemma2-9b, gemma2-9b-instruct, gemma2-27b, gemma2-27b-instruct.
@@ -618,6 +624,7 @@ The complete list of supported models and datasets can be found at [Supported Mo
 | DeepSeek-VL                                             | [DeepSeek series vision models](https://github.com/deepseek-ai)                        | Chinese<br>English | 1.3B-7B                               | chat model               |
 | MiniCPM-V<br>MiniCPM-V-2<br>MiniCPM-V-2.5<br>MiniCPM-V-2.6               | [OpenBmB MiniCPM vision model](https://github.com/OpenBMB/MiniCPM)                     | Chinese<br>English | 3B-9B                                 | chat model               |
 | CogVLM<br>CogAgent<br>CogVLM2<br>CogVLM2-Video<br>GLM4V | [Zhipu ChatGLM visual QA and Agent model](https://github.com/THUDM/)                   | Chinese<br>English | 9B-19B                                | chat model               |
+| Llava-HF               | [Llava-HF series models](https://huggingface.co/llava-hf)                          | English       | 0.5B-110B           | chat model           |
 | Llava1.5<br>Llava1.6                                    | [Llava series models](https://github.com/haotian-liu/LLaVA)                            | English            | 7B-34B                                | chat model               |
 | Llava-Next<br>Llava-Next-Video                          | [Llava-Next series models](https://github.com/LLaVA-VL/LLaVA-NeXT)                     | Chinese<br>English | 7B-110B                               | chat model               |
 | mPLUG-Owl                                               | [mPLUG-Owl series models](https://github.com/X-PLUG/mPLUG-Owl)                         | English            | 11B                                   | chat model               |
 
@@ -56,6 +56,11 @@ SWIFT具有丰富全面的文档，请查看我们的文档网站:
 
 
 ## 🎉 新闻
+- 🔥2024.08.22: 支持[ReFT](https://github.com/stanfordnlp/pyreft), 该tuner可以以LoRA的1/15~1/65的参数量达到和LoRA匹配或更好的效果, 使用`--sft_type reft`开始训练!
+- 2024.08.21: 支持phi3_5-mini-instruct, phi3_5-moe-instruct, phi3_5-vision-instruct.
+- 2024.08.21: 支持idefics3-8b-llama3, llava-onevision-qwen2-0_5b-ov, llava-onevision-qwen2-7b-ov, llava-onevision-qwen2-72b-ov.
+- 🔥2024.08.20: 支持使用deepspeed-zero3对多模态大模型进行微调.
+- 2024.08.20: 支持模型: longwriter-glm4-9b, longwriter-llama3_1-8b. 支持数据集: longwriter-6k.
 - 🔥2024.08.12: 🎉 SWIFT论文已经发布到arXiv上，可以点击[这个链接](https://arxiv.org/abs/2408.05517)阅读.
 - 🔥2024.08.12: 支持packing和flash-attention时不污染attention_mask, 使用`--packing`开启。详情见[PR](https://github.com/huggingface/transformers/pull/31629/files).
 - 🔥2024.08.09: 支持qwen2-audio模型的推理与微调. 最佳实践可以查看[这里](https://github.com/modelscope/ms-swift/issues/1653).
@@ -69,6 +74,8 @@ SWIFT具有丰富全面的文档，请查看我们的文档网站:
 - 🔥2024.07.24: 人类偏好对齐算法支持视觉多模态大模型, 包括DPO/ORPO/SimPO/CPO, 训练参考[文档](docs/source/Multi-Modal/人类偏好对齐训练文档.md). 支持数据集RLAIF-V.
 - 🔥2024.07.24: 支持使用megatron对qwen2系列进行CPT和SFT. 可以查看[megatron训练文档](docs/source/LLM/Megatron训练文档.md).
 - 🔥2024.07.24: 支持llama3.1系列模型. 包含8b, 70b, 405b. 支持openbuddy-llama3_1-8b-chat.
+<details><summary>More</summary>
+
 - 2024.07.20: 支持mistral-nemo系列模型. 使用`--model_type mistral-nemo-base-2407`以及`--model_type mistral-nemo-instruct-2407`开始训练和推理.
 - 🔥2024.07.19: 支持[Q-Galore](https://arxiv.org/abs/2407.08296)算法, 该算法可以减少显存使用约60% (qwen-7b-chat, full, 80G -> 35G), 使用命令行:`swift sft --model_type xxx --use_galore true --galore_quantization true`来开始训练!
 - 2024.07.17: 支持InternVL2系列新模型: `model_type`分别为internvl2-1b, internvl2-40b, internvl2-llama3-76b. 最佳实践可以查看[这里](docs/source/Multi-Modal/internvl最佳实践.md).
@@ -82,8 +89,6 @@ SWIFT具有丰富全面的文档，请查看我们的文档网站:
 - 2024.07.04: 支持internlm2_5-7b系列: internlm2_5-7b, internlm2_5-7b-chat, internlm2_5-7b-chat-1m.
 - 2024.07.02: 支持`llava1_6-vicuna-7b-instruct`, `llava1_6-vicuna-13b-instruct`等llava-hf模型. 最佳实践可以查看[这里](docs/source/Multi-Modal/llava最佳实践.md).
 - 🔥2024.06.29: 支持[eval-scope](https://github.com/modelscope/eval-scope)&[open-compass](https://github.com/open-compass/opencompass)评测! 我们支持了包含`BoolQ, ocnli, humaneval, math, ceval, mmlu, gsk8k, ARC_e`等50+标准数据集在内的评测流程, 请查看我们的[评测文档](https://github.com/modelscope/swift/blob/main/docs/source/LLM/LLM评测文档.md)来使用。下个迭代我们会支持多模态评测和Agent评测，记得持续关注我们: )
-<details><summary>More</summary>
-
 - 🔥2024.06.28: 支持**Florence**系列模型: 可以查看[Florence最佳实践](docs/source/Multi-Modal/florence最佳实践.md).
 - 🔥2024.06.28: 支持**Gemma2**系列模型: gemma2-9b, gemma2-9b-instruct, gemma2-27b, gemma2-27b-instruct.
 - 🔥2024.06.18: 支持**DeepSeek-Coder-v2**系列模型! 使用model_type`deepseek-coder-v2-instruct`和`deepseek-coder-v2-lite-instruct`来开启训练和推理.
@@ -612,6 +617,7 @@ CUDA_VISIBLE_DEVICES=0 swift deploy \
 | DeepSeek-VL                                             | [幻方系列视觉模型](https://github.com/deepseek-ai)                                 | 中文<br>英文 | 1.3B-7B          | chat模型           |
 | MiniCPM-V<br>MiniCPM-V-2<br>MiniCPM-V-2.5<br>MiniCPM-V-2.6               | [OpenBmB MiniCPM视觉模型](https://github.com/OpenBMB/MiniCPM)                  | 中文<br>英文 | 3B-9B            | chat模型           |
 | CogVLM<br>CogAgent<br>CogVLM2<br>CogVLM2-Video<br>GLM4V | [智谱ChatGLM视觉问答和Agent模型](https://github.com/THUDM/)                         | 中文<br>英文 | 9B-19B           | chat模型           |
+| Llava-HF               | [Llava-HF系列模型](https://huggingface.co/llava-hf)                          | 英文       | 0.5B-110B           | chat模型           |
 | Llava1.5<br>Llava1.6                                    | [Llava系列模型](https://github.com/haotian-liu/LLaVA)                          | 英文       | 7B-34B           | chat模型           |
 | Llava-Next<br>Llava-Next-Video                          | [Llava-Next系列模型](https://github.com/LLaVA-VL/LLaVA-NeXT)                   | 中文<br>英文 | 7B-110B          | chat模型           |
 | mPLUG-Owl                                               | [mPLUG-Owl系列模型](https://github.com/X-PLUG/mPLUG-Owl)                       | 英文       | 11B              | chat模型           |
 
@@ -211,8 +211,8 @@ from swift.tuners import Swift
 ckpt_dir = 'vx-xxx/checkpoint-100'
 model_type = ModelType.qwen_7b_chat
 template_type = get_default_template_type(model_type)
-
-model, tokenizer = get_model_tokenizer(model_type, model_kwargs={'device_map': 'auto'})
+model_id_or_path = None
+model, tokenizer = get_model_tokenizer(model_type, model_id_or_path=model_id_or_path, model_kwargs={'device_map': 'auto'})
 
 model = Swift.from_pretrained(model, ckpt_dir, inference_mode=True)
 template = get_template(template_type, tokenizer)
 
@@ -42,8 +42,9 @@ print(f'template_type: {template_type}')  # template_type: qwen
 
 kwargs = {}
 # kwargs['use_flash_attn'] = True  # 使用flash_attn
-
-model, tokenizer = get_model_tokenizer(model_type, model_kwargs={'device_map': 'auto'}, **kwargs)
+model_id_or_path = None
+model, tokenizer = get_model_tokenizer(model_type, model_id_or_path=model_id_or_path,
+                                       model_kwargs={'device_map': 'auto'}, **kwargs)
 # 修改max_new_tokens
 model.generation_config.max_new_tokens = 128
 
@@ -178,8 +179,8 @@ from swift.utils import seed_everything
 model_type = ModelType.qwen_7b_chat
 template_type = get_default_template_type(model_type)
 print(f'template_type: {template_type}')  # template_type: qwen
-
-model, tokenizer = get_model_tokenizer(model_type, model_kwargs={'device_map': 'auto'})
+model_id_or_path = None
+model, tokenizer = get_model_tokenizer(model_type, model_id_or_path=model_id_or_path, model_kwargs={'device_map': 'auto'})
 
 template = get_template(template_type, tokenizer)
 seed_everything(42)
 
@@ -37,7 +37,8 @@ from swift.llm import (
 )
 
 model_type = ModelType.qwen_7b_chat
-lmdeploy_engine = get_lmdeploy_engine(model_type)
+model_id_or_path = None
+lmdeploy_engine = get_lmdeploy_engine(model_type, model_id_or_path=model_id_or_path)
 template_type = get_default_template_type(model_type)
 template = get_template(template_type, lmdeploy_engine.hf_tokenizer)
 # 与`transformers.GenerationConfig`类似的接口
@@ -95,7 +96,8 @@ from swift.llm import (
 
 if __name__ == '__main__':
     model_type = ModelType.qwen2_7b_instruct
-    lmdeploy_engine = get_lmdeploy_engine(model_type, tp=2)
+    model_id_or_path = None
+    lmdeploy_engine = get_lmdeploy_engine(model_type, model_id_or_path=model_id_or_path, tp=2)
     template_type = get_default_template_type(model_type)
     template = get_template(template_type, lmdeploy_engine.hf_tokenizer)
     # 与`transformers.GenerationConfig`类似的接口
 
@@ -59,8 +59,8 @@ print(f'template_type: {template_type}')  # template_type: qwen
 
 kwargs = {}
 # kwargs['use_flash_attn'] = True  # 使用flash_attn
-
-model, tokenizer = get_model_tokenizer(model_type, torch.float16,
+model_id_or_path = None
+model, tokenizer = get_model_tokenizer(model_type, torch.float16, model_id_or_path=model_id_or_path,
                                        model_kwargs={'device_map': 'auto'}, **kwargs)
 # 修改max_new_tokens
 model.generation_config.max_new_tokens = 128
@@ -108,7 +108,8 @@ from swift.llm import (
 import torch
 
 model_type = ModelType.qwen1half_7b_chat_awq
-llm_engine = get_vllm_engine(model_type, torch.float16, max_model_len=4096)
+model_id_or_path = None
+llm_engine = get_vllm_engine(model_type, torch.float16, model_id_or_path=model_id_or_path, max_model_len=4096)
 template_type = get_default_template_type(model_type)
 template = get_template(template_type, llm_engine.hf_tokenizer)
 # 与`transformers.GenerationConfig`类似的接口
@@ -264,8 +265,8 @@ seed_everything(42)
 ckpt_dir = 'output/qwen1half-7b-chat/vx-xxx/checkpoint-xxx'
 model_type = ModelType.qwen1half_7b_chat
 template_type = get_default_template_type(model_type)
-
-model, tokenizer = get_model_tokenizer(model_type, model_kwargs={'device_map': 'auto'})
+model_id_or_path = None
+model, tokenizer = get_model_tokenizer(model_type, model_id_or_path=model_id_or_path, model_kwargs={'device_map': 'auto'})
 model.generation_config.max_new_tokens = 128
 
 model = Swift.from_pretrained(model, ckpt_dir, inference_mode=True)
 
@@ -42,7 +42,8 @@ from swift.llm import (
 )
 
 model_type = ModelType.qwen_7b_chat
-llm_engine = get_vllm_engine(model_type)
+model_id_or_path = None
+llm_engine = get_vllm_engine(model_type, model_id_or_path=model_id_or_path)
 template_type = get_default_template_type(model_type)
 template = get_template(template_type, llm_engine.hf_tokenizer)
 # 与`transformers.GenerationConfig`类似的接口
@@ -98,7 +99,8 @@ from swift.llm import (
 )
 if __name__ == '__main__':
     model_type = ModelType.qwen_7b_chat
-    llm_engine = get_vllm_engine(model_type, tensor_parallel_size=2)
+    model_id_or_path = None
+    llm_engine = get_vllm_engine(model_type, model_id_or_path=model_id_or_path, tensor_parallel_size=2)
     template_type = get_default_template_type(model_type)
     template = get_template(template_type, llm_engine.hf_tokenizer)
     # 与`transformers.GenerationConfig`类似的接口
 
@@ -1,5 +1,7 @@
 ## LLM文档
 
+[English Documentation](https://swift.readthedocs.io/en/latest/)
+
 ### 📚教程
 
 1. [LLM推理文档](LLM推理文档.md)