Skip to content
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion docs/source/Instruction/Export-and-push.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,8 @@ SWIFT支持AWQ、GPTQ、FP8、BNB模型的量化导出。其中使用AWQ、GPTQ

| 量化技术 | 多模态 | 推理加速 | 继续训练 |
| -------- | ------ | -------- | -------- |
| GPTQ | ✅ | ✅ | ✅ |
| GPTQ/GPTQ-V2 | ✅ | ✅ | ✅ |
| FP8 | ✅ | ✅ | ✅ |
| AWQ | ✅ | ✅ | ✅ |
| BNB | ❌ | ✅ | ✅ |

Expand Down
3 changes: 2 additions & 1 deletion docs/source_en/Instruction/Export-and-push.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,8 @@ SWIFT supports quantization exports for AWQ, GPTQ, FP8, and BNB models. AWQ and

| Quantization Technique | Multimodal | Inference Acceleration | Continued Training |
| ---------------------- | ---------- | ---------------------- | ------------------ |
| GPTQ | ✅ | ✅ | ✅ |
| GPTQ/GPTQ-V2 | ✅ | ✅ | ✅ |
| FP8 | ✅ | ✅ | ✅ |
| AWQ | ✅ | ✅ | ✅ |
| BNB | ❌ | ✅ | ✅ |

Expand Down
2 changes: 2 additions & 0 deletions swift/pipelines/export/quant.py
Original file line number Diff line number Diff line change
Expand Up @@ -280,6 +280,8 @@ def gptq_model_quantize(self, v2: bool = False):
logger.info('Start quantizing the model...')
logger.warning('The process of packing the model takes a long time and there is no progress bar. '
'Please be patient and wait...')
if not hasattr(self.model, 'hf_device_map'):
self.model.hf_device_map = {'': torch.device('cuda:0')}
Comment on lines +283 to +284
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Hardcoding the device to cuda:0 can lead to failures or incorrect behavior on multi-GPU systems or CPU-only environments. It is safer to use the model's current device when initializing the hf_device_map.

Suggested change
if not hasattr(self.model, 'hf_device_map'):
self.model.hf_device_map = {'': torch.device('cuda:0')}
if not hasattr(self.model, 'hf_device_map'):
self.model.hf_device_map = {'': self.model.device}

with self._patch_gptq_block(self.model, block_name_to_quantize):
gptq_quantizer.quantize_model(self.model, self.tokenizer)
self.model.config.quantization_config.pop('dataset', None)
Expand Down
Loading