You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+3-1Lines changed: 3 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -47,6 +47,7 @@ SWIFT has rich documentations for users, please check [here](https://github.com/
47
47
SWIFT web-ui is available both on [Huggingface space](https://huggingface.co/spaces/tastelikefeet/swift) and [ModelScope studio](https://www.modelscope.cn/studios/iic/Scalable-lightWeight-Infrastructure-for-Fine-Tuning/summary), please feel free to try!
48
48
49
49
## 🎉 News
50
+
- 2024.07.06: Support codegeex4-9b-chat.
50
51
- 2024.07.04: Support internlm2_5-7b series: internlm2_5-7b, internlm2_5-7b-chat, internlm2_5-7b-chat-1m.
51
52
- 2024.07.02: Support for using vLLM for accelerating inference and deployment of multimodal large models such as the llava series and phi3-vision models. You can refer to the [Multimodal & vLLM Inference Acceleration Documentation](docs/source_en/Multi-Modal/vllm-inference-acceleration.md) for more information.
52
53
- 2024.07.02: Support for `llava1_6-vicuna-7b-instruct`, `llava1_6-vicuna-13b-instruct` and other llava-hf models. For best practices, refer to [here](docs/source_en/Multi-Modal/llava-best-practice.md).
@@ -387,6 +388,7 @@ swift sft \
387
388
388
389
#### Multi-node Multi-GPU
389
390
```shell
391
+
# If multiple machines share a disk, please additionally specify `--save_on_each_node false`.
390
392
# node0
391
393
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
392
394
NNODES=2 \
@@ -507,7 +509,7 @@ The complete list of supported models and datasets can be found at [Supported Mo
507
509
| Model Type | Model Introduction | Language | Model Size | Model Type |
| Qwen<br>Qwen1.5<br>Qwen2 |[Tongyi Qwen 1.0 and 1.5 series models](https://github.com/QwenLM)| Chinese<br>English | 0.5B-110B<br>including quantized versions | base model<br>chat model<br>MoE model<br>code model |
510
-
| ChatGLM2<br>ChatGLM3<br>Codegeex2<br>GLM4 |[Zhipu ChatGLM series models](https://github.com/THUDM)| Chinese<br>English | 6B-9B | base model<br>chat model<br>code model<br>long text model |
512
+
| ChatGLM2<br>ChatGLM3<br>Codegeex2<br>GLM4<br>Codegeex4|[Zhipu ChatGLM series models](https://github.com/THUDM)| Chinese<br>English | 6B-9B | base model<br>chat model<br>code model<br>long text model |
511
513
| Baichuan<br>Baichuan2 |[Baichuan 1 and Baichuan 2](https://github.com/baichuan-inc)| Chinese<br>English | 7B-13B<br>including quantized versions | base model<br>chat model |
512
514
| Yuan2 |[Langchao Yuan series models](https://github.com/IEIT-Yuan)| Chinese<br>English | 2B-102B | instruct model |
513
515
| XVerse |[XVerse series models](https://github.com/xverse-ai)| Chinese<br>English | 7B-65B | base model<br>chat model<br>long text model<br>MoE model |
Copy file name to clipboardExpand all lines: docs/source_en/LLM/Command-line-parameters.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -29,7 +29,7 @@
29
29
-`--add_output_dir_suffix`: Default is `True`, indicating that a suffix of `model_type` and fine-tuning version number will be appended to the `output_dir` directory. Set to `False` to avoid this behavior.
30
30
-`--ddp_backend`: Backend support for distributed training, default is `None`. Options include: 'nccl', 'gloo', 'mpi', 'ccl'.
31
31
-`--seed`: Global seed, default is `42`. Used to reproduce training results.
32
-
-`--resume_from_checkpoint`: Used for resuming training from a checkpoint, default is `None`. You can set it to the path of the checkpoint, for example: `'output/qwen-7b-chat/vx-xxx/checkpoint-xxx'`, to resume training from that point. Supports adjusting `--resume_only_model` to only read the model file during checkpoint continuation.
32
+
-`--resume_from_checkpoint`: Used for resuming training from a checkpoint, default is `None`. You can set it to the path of the checkpoint, for example: `--resume_from_checkpoint output/qwen-7b-chat/vx-xxx/checkpoint-xxx`, to resume training from that point. Supports adjusting `--resume_only_model` to only read the model file during checkpoint continuation.
33
33
-`--resume_only_model`: Default is `False`, which means strict checkpoint continuation, this will read the weights of the model, optimizer, lr_scheduler, and the random seeds stored on each device, and continue training from the last paused steps. If set to `True`, it will only read the weights of the model.
34
34
-`--dtype`: torch_dtype when loading base model, default is `'AUTO'`, i.e. intelligently select dtype: if machine does not support bf16, use fp16; if `MODEL_MAPPING` specifies torch_dtype for corresponding model, use its dtype; otherwise use bf16. Options include: 'bf16', 'fp16', 'fp32'.
35
35
-`--dataset`: Used to select the training dataset, default is `[]`. You can see the list of available datasets [here](Supported-models-datasets.md#Datasets). If you need to train with multiple datasets, you can use ',' or ' ' to separate them, for example: `--dataset alpaca-en,alpaca-zh` or `--dataset alpaca-en alpaca-zh`. It supports Modelscope Hub/HuggingFace Hub/local paths, subset selection, and dataset sampling. The specified format for each dataset is as follows: `[HF or MS::]{dataset_name} or {dataset_id} or {dataset_path}[:subset1/subset2/...][#dataset_sample]`. The simplest case requires specifying only dataset_name, dataset_id, or dataset_path. Customizing datasets can be found in the [Customizing and Extending Datasets document](Customization.md#custom-dataset)
0 commit comments