You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+4-2Lines changed: 4 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -39,6 +39,8 @@ To facilitate use by users unfamiliar with deep learning, we provide a Gradio we
39
39
Additionally, we are expanding capabilities for other modalities. Currently, we support full-parameter training and LoRA training for AnimateDiff.
40
40
41
41
## 🎉 News
42
+
- 2024.04.13: Support the fine-tuning and inference of Mixtral-8x22B-v0.1 model, use [this script](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/mixtral_moe_8x22b_v1/lora_ddp_ds/sft.sh) to start training!
43
+
- 2024.04.13: Support the newly launched **MiniCPM** series: MiniCPM-V-2.0、MiniCPM-2B-128k、MiniCPM-MoE-8x2B and MiniCPM-1B.use [this script](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/minicpm_moe_8x2b/lora_ddp/sft.sh) to start training!
42
44
- 🔥2024.04.11: Support Model Evaluation with MMLU/ARC/CEval datasets(also user custom eval datasets) with one command! Check [this documentation](docs/source_en/LLM/LLM-eval.md) for details. Meanwhile, we support a trick way to do multiple ablation experiments, check [this documentation](docs/source_en/LLM/LLM-exp.md) to use.
43
45
- 🔥2024.04.11: Support **c4ai-command-r** series: c4ai-command-r-plus, c4ai-command-r-v01, [this script](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/c4ai-command-r-plus/lora_mp/sft.sh) to train.
44
46
- 2024.04.10: Use SWIFT to fine-tune the qwen-7b-chat model to enhance its function call capabilities, and combine it with [Modelscope-Agent](https://github.com/modelscope/modelscope-agent) for best practices, which can be found [here](https://github.com/modelscope/swift/tree/main/docs/source_en/LLM/Agent-best-practice.md#Usage-with-Modelscope_Agent).
@@ -383,13 +385,13 @@ CUDA_VISIBLE_DEVICES=0 swift deploy \
383
385
| Yuan2 |[Langchao Yuan series models](https://github.com/IEIT-Yuan)| Chinese<br>English | 2B-102B | instruct model |
384
386
| XVerse |[XVerse series models](https://github.com/xverse-ai)| Chinese<br>English | 7B-65B | base model<br>chat model<br>long text model<br>MoE model |
385
387
| LLaMA2 |[LLaMA2 series models](https://github.com/facebookresearch/llama)| English | 7B-70B<br>including quantized versions | base model<br>chat model |
386
-
| Mistral<br>Mixtral |[Mistral series models](https://github.com/mistralai/mistral-src)| English | 7B | base model<br>instruct model<br>MoE model |
388
+
| Mistral<br>Mixtral |[Mistral series models](https://github.com/mistralai/mistral-src)| English | 7B-22B| base model<br>instruct model<br>MoE model |
387
389
| YI |[01AI's YI series models](https://github.com/01-ai)| Chinese<br>English | 6B-34B | base model<br>chat model<br>long text model |
388
390
| InternLM<br>InternLM2<br>InternLM2-Math |[Pujiang AI Lab InternLM series models](https://github.com/InternLM/InternLM)| Chinese<br>English | 1.8B-20B | base model<br>chat model<br>math model |
389
391
| DeepSeek<br>DeepSeek-MoE<br>DeepSeek-Coder<br>DeepSeek-Math |[DeepSeek series models](https://github.com/deepseek-ai)| Chinese<br>English | 1.3B-67B | base model<br>chat model<br>MoE model<br>code model<br>math model |
390
392
| MAMBA |[MAMBA temporal convolution model](https://github.com/state-spaces/mamba)| English | 130M-2.8B | base model |
391
393
| Gemma |[Google Gemma series models](https://github.com/google/gemma_pytorch)| English | 2B-7B | base model<br>instruct model |
392
-
| MiniCPM |[OpenBmB MiniCPM series models](https://github.com/OpenBMB/MiniCPM)| Chinese<br>English | 2B-3B | chat model |
0 commit comments