|
| 1 | +# Export and Push |
| 2 | + |
| 3 | +## Merge LoRA |
| 4 | + |
| 5 | +- See [here](https://github.com/modelscope/ms-swift/blob/main/examples/export/merge_lora.sh). |
| 6 | + |
| 7 | +## Quantization |
| 8 | + |
| 9 | +SWIFT supports quantization exports for AWQ, GPTQ, and BNB models. AWQ and GPTQ require a calibration dataset, which yields better quantization performance but takes longer to quantize. On the other hand, BNB does not require a calibration dataset and is quicker to quantize. |
| 10 | + |
| 11 | +| Quantization Technique | Multimodal | Inference Acceleration | Continued Training | |
| 12 | +| ---------------------- | ---------- | ---------------------- | ------------------ | |
| 13 | +| GPTQ | ✅ | ✅ | ✅ | |
| 14 | +| AWQ | ✅ | ✅ | ✅ | |
| 15 | +| BNB | ❌ | ✅ | ✅ | |
| 16 | + |
| 17 | +In addition to the SWIFT installation, the following additional dependencies need to be installed: |
| 18 | + |
| 19 | +```shell |
| 20 | +# For AWQ quantization: |
| 21 | +# The versions of autoawq and CUDA are correlated; please choose the version according to `https://github.com/casper-hansen/AutoAWQ`. |
| 22 | +# If there are dependency conflicts with torch, please add the `--no-deps` option. |
| 23 | +pip install autoawq -U |
| 24 | + |
| 25 | +# For GPTQ quantization: |
| 26 | +# The versions of auto_gptq and CUDA are correlated; please choose the version according to `https://github.com/PanQiWei/AutoGPTQ#quick-installation`. |
| 27 | +pip install auto_gptq optimum -U |
| 28 | + |
| 29 | +# For BNB quantization: |
| 30 | +pip install bitsandbytes -U |
| 31 | +``` |
| 32 | + |
| 33 | +We provide a series of scripts to demonstrate SWIFT's quantization export capabilities: |
| 34 | + |
| 35 | +- Supports [AWQ](https://github.com/modelscope/ms-swift/blob/main/examples/export/quantize/awq.sh)/[GPTQ](https://github.com/modelscope/ms-swift/blob/main/examples/export/quantize/gptq.sh)/[BNB](https://github.com/modelscope/ms-swift/blob/main/examples/export/quantize/bnb.sh) quantization exports. |
| 36 | +- Multimodal quantization: Supports quantizing multimodal models using GPTQ and AWQ, with limited multimodal models supported by AWQ. Refer to [here](https://github.com/modelscope/ms-swift/tree/main/examples/export/quantize/mllm). |
| 37 | +- Support for more model series: Supports quantization exports for [BERT](https://github.com/modelscope/ms-swift/tree/main/examples/export/quantize/bert) and [Reward Model](https://github.com/modelscope/ms-swift/tree/main/examples/export/quantize/reward_model). |
| 38 | +- Models exported with SWIFT's quantization support inference acceleration using vllm/lmdeploy; they also support further SFT/RLHF using QLoRA. |
| 39 | + |
| 40 | + |
| 41 | +## Push Model |
| 42 | + |
| 43 | +SWIFT supports re-pushing trained/quantized models to ModelScope/Hugging Face. By default, it pushes to ModelScope, but you can specify `--use_hf true` to push to Hugging Face. |
| 44 | + |
| 45 | +```shell |
| 46 | +swift export \ |
| 47 | + --model output/vx-xxx/checkpoint-xxx \ |
| 48 | + --push_to_hub true \ |
| 49 | + --hub_model_id '<model-id>' \ |
| 50 | + --hub_token '<sdk-token>' \ |
| 51 | + --use_hf false |
| 52 | +``` |
| 53 | + |
| 54 | +Tips: |
| 55 | + |
| 56 | +- You can use `--model <checkpoint-dir>` or `--adapters <checkpoint-dir>` to specify the checkpoint directory to be pushed. There is no difference between these two methods in the model pushing scenario. |
| 57 | +- When pushing to ModelScope, you need to make sure you have registered for a ModelScope account. Your SDK token can be obtained from [this page](https://www.modelscope.cn/my/myaccesstoken). Ensure that the account associated with the SDK token has edit permissions for the organization corresponding to the model_id. The model pushing process will automatically create a model repository corresponding to the model_id (if it does not already exist), and you can use `--hub_private_repo true` to automatically create a private model repository. |
0 commit comments