Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/source/Instruction/Export-and-push.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ SWIFT支持AWQ、GPTQ、FP8、BNB模型的量化导出。其中使用AWQ、GPTQ

| 量化技术 | 多模态 | 推理加速 | 继续训练 |
| -------- | ------ | -------- | -------- |
| FP8 | ✅ | ✅ | ✅ |
| GPTQ | ✅ | ✅ | ✅ |
| AWQ | ✅ | ✅ | ✅ |
| BNB | ❌ | ✅ | ✅ |
Expand Down
3 changes: 2 additions & 1 deletion docs/source_en/Instruction/Export-and-push.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,8 @@ SWIFT supports quantization exports for AWQ, GPTQ, FP8, and BNB models. AWQ and

| Quantization Technique | Multimodal | Inference Acceleration | Continued Training |
| ---------------------- | ---------- | ---------------------- | ------------------ |
| GPTQ | ✅ | ✅ | ✅ |
| FP8 | ✅ | ✅ | ✅ |
| GPTQ | ✅ | ✅ | ✅ |
| AWQ | ✅ | ✅ | ✅ |
| BNB | ❌ | ✅ | ✅ |

Expand Down
2 changes: 1 addition & 1 deletion swift/pipelines/export/merge_lora.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ def merge_lora(args: ExportArguments, device_map=None, replace_if_exists=False)
model_dirs=args.adapters,
max_shard_size=args.max_shard_size,
additional_saved_files=model.model_meta.additional_saved_files)
logger.info(f'Successfully merged LoRA and saved in {output_dir}.')
logger.info(f'Successfully merged LoRA and saved in `{output_dir}`.')
args.device_map = origin_device_map

args.model = output_dir
Expand Down
4 changes: 3 additions & 1 deletion swift/pipelines/export/quant.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ def quantize(self):
args.output_dir,
model_dirs=[args.model_dir],
additional_saved_files=self.model.model_meta.additional_saved_files)
logger.info(f'Successfully quantized the model and saved in {args.output_dir}.')
logger.info(f'Successfully quantized the model and saved in `{args.output_dir}`.')

@torch.inference_mode()
def _prepare_gptq_dataset(self, examples: List[Dict[str, torch.LongTensor]], batch_size: int = 1, *args, **kwargs):
Expand Down Expand Up @@ -280,6 +280,8 @@ def gptq_model_quantize(self, v2: bool = False):
logger.info('Start quantizing the model...')
logger.warning('The process of packing the model takes a long time and there is no progress bar. '
'Please be patient and wait...')
if not hasattr(self.model, 'hf_device_map'):
self.model.hf_device_map = {'': torch.device('cuda:0')}
Comment on lines +283 to +284
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Hardcoding the device to cuda:0 can lead to failures or incorrect behavior on multi-GPU systems or CPU-only environments. It is safer to use the model's current device when initializing the hf_device_map.

Suggested change
if not hasattr(self.model, 'hf_device_map'):
self.model.hf_device_map = {'': torch.device('cuda:0')}
if not hasattr(self.model, 'hf_device_map'):
self.model.hf_device_map = {'': self.model.device}

with self._patch_gptq_block(self.model, block_name_to_quantize):
gptq_quantizer.quantize_model(self.model, self.tokenizer)
self.model.config.quantization_config.pop('dataset', None)
Expand Down
Loading