Skip to content

Commit 85076f9

Browse files
committed
Merge branch 'main' into release/2.6
2 parents 1175563 + 4f7054f commit 85076f9

File tree

5 files changed

+212
-7
lines changed

5 files changed

+212
-7
lines changed

docs/source/Instruction/常见问题整理.md

Lines changed: 94 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -175,6 +175,43 @@ swift目前还不支持词表扩充。
175175
### Q47: 多机训练速度缓慢,在使用swift框架进行LLM训练时,发现采用deepspeed zero3训练会出现严重的速度下降问题
176176
详见[issue](https://github.com/modelscope/ms-swift/issues/1825)
177177

178+
### Q48: swift现在是支持qwen2-vl多阶段预训练的吗?我看官方的最佳实践里的sft好像都是vit+llm一起训的,不知道支不支持单独finetune
179+
详见[issue](https://github.com/modelscope/ms-swift/issues/2222)
180+
181+
### Q49: qwen2-vl是不是不支持混合纯文本数据?
182+
支持图文和纯文本。
183+
184+
### Q50: 微调的时候可以绘制不同数据集的loss曲线吗?
185+
不支持的,数据集是混合训练的。
186+
187+
### Q51: 模型训练后,回复重复了很多内容
188+
参考[LLM微调文档](https://swift.readthedocs.io/zh-cn/latest/Instruction/LLM%E5%BE%AE%E8%B0%83%E6%96%87%E6%A1%A3.html)。如果训练过程中出现重复的情况,请多训练几个epoch, 清洗数据, 全参数训练, 采用RLHF的方式缓解。
189+
190+
### Q52: 想问一下swift目前支持prompt tuning或者prefix tuning吗?
191+
不支持,这两个方法知识遗忘比较严重,目前不推荐使用。
192+
193+
### Q53: 两张A10训练报错如下:
194+
```text
195+
[rank0]: torch.distributed.DistBackendError: NCCL error in:../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1970, unhandled system error (run with NCCL_DEBUG=INFO for details),NCCL version 2.20.5
196+
[rank0]:ncclSystemError: System call (e.g. socket,malloc) or external library call failed or device error.
197+
```
198+
请检查共享内存是否太小,nccl需要共享内存。
199+
200+
### Q54: 请问在采用DDP微调训练的过程中,冻结某些层时导致的某些参数未参与梯度回传问题怎么解决?
201+
配置参数`--ddp_find_unused_parameters true`
202+
203+
### Q55: swift有没有数据集质检工具?
204+
[data-juicer](https://github.com/modelscope/data-juicer)
205+
206+
### Q55: web端在哪启动模型并行?只找到了数据并行的勾选项,没找到模型并行在哪。
207+
指定可见显卡就可以。
208+
209+
### Q56: 如何不自动shuffle呢?想把它关掉。
210+
目前只能改transformers[代码](https://github.com/huggingface/transformers/blob/main/src/transformers/trainer.py)
211+
212+
### Q57: 'num_items_in_batch'是个什么参数?没找到在哪。
213+
升级`ms-swift==2.5.2`或者降低`transformers<4.46`
214+
178215
## 推理
179216

180217
### Q1:swift推理有文档吗?
@@ -224,6 +261,40 @@ qlora训练的模型不支持merge-lora的, 建议lora微调后 merge-lora再量
224261
### Q14: 有人遇到过这个问题吗?RuntimeError: "triu_tril_cuda_template" not implemented for'BFloat16'
225262
升级torch,这个版本的torch没实现这个算子。
226263

264+
### Q15: qwen2-audio支持流式推理吗?
265+
支持,详见[issue](https://github.com/modelscope/ms-swift/issues/1653)
266+
267+
### Q16: inference client推理多模态,do_sample在哪里设置?
268+
设置temperature=0。
269+
270+
### Q17: ms-swift支持大模型批处理不?
271+
支持的。python脚本推理时,文档中request_list可以有多个query,部署时服务端会自动batch处理,详见[VLLM推理加速与部署](https://swift.readthedocs.io/zh-cn/latest/LLM/VLLM%E6%8E%A8%E7%90%86%E5%8A%A0%E9%80%9F%E4%B8%8E%E9%83%A8%E7%BD%B2.html#id3)
272+
273+
### Q18: ms-swift量化模型的时候,显示内存不足,可以在量化的时候少占用一些资源吗,慢一点没关系。
274+
尝试设置`--quant_device_map cpu`
275+
276+
### Q19: swift支持对多模态模型量化吗?
277+
支持。
278+
279+
### Q20: 使用GPTQ报错如下,请问是啥原因?
280+
```text
281+
if llm_config['architectures'][0] == 'LlamaForCausalLM':
282+
KeyError: 'architectures'
283+
```
284+
尝试transformers==4.44.*版本。
285+
286+
### Q21: swift infer如何将评估的结果保存到指定文件呢 每次都不知道保存到哪里了
287+
设置--result_dir your_path,详见[InferArguments](https://github.com/modelscope/ms-swift/blob/main/swift/llm/utils/argument.py)
288+
289+
### Q22: AWQ量化yi-vl-6b出错如下:
290+
```text
291+
TypeError: swift.llm.utils.model.get_model_tokenizer_with_flash_attn() got multiple values for keyword argument 'automodel_class'.
292+
```
293+
请使用gptq量化。
294+
295+
### Q23: 想问一下用swift export对qwen2.5 72B模型进行gptq int4量化,max model length=32768用的是默认值,给的校准数据集有128个样本,但是量化的时候报错了,报错日志是:factorization could not be completed because the input is not positive-definite(the leading minor of order 18145 is not pisitive-definite)。是什么原因?
296+
海森矩阵不正定的问题,试试其他的数据集。
297+
227298
## 部署
228299

229300
### Q1: 如何部署训练后的模型?
@@ -256,6 +327,16 @@ base模型可以用client.chat.completions.create的,不过这个是兼容行
256327
### Q10: 在本地部署qwen2vl模型,推理后端使用vllm,本地视频怎么传入呢?可以使用 base64 传进去吗?curl调用如何加载视频呢?
257328
可以查看[mllm部署文档](https://swift.readthedocs.io/zh-cn/latest/Multi-Modal/MLLM%E9%83%A8%E7%BD%B2%E6%96%87%E6%A1%A3.html)。url、base64、本地路径都可以的, 本地路径只限于本机测试。
258329

330+
### Q11: qwen2-vl部署时报错如下,是vllm的版本不对么?
331+
```text
332+
Unrecognized keys in `rope_scaling`for 'rope_type'='default': {'mrope_section'} Unrecognized keys in `rope_scaling`for 'rope_type'='default': {'mrope_section'}
333+
```
334+
详见[issue](https://github.com/QwenLM/Qwen2-VL/issues/209)
335+
336+
### Q12: swift推理可以输出预测概率吗?在部署的时候,怎么设置?
337+
python脚本推理时,`model.generation_config.output_logits = True, model.generation_config.return_dict_in_generate = True`
338+
部署时,客户端传参数,`logprobs=True, top_logprobs=5`
339+
259340
## 评测
260341

261342
### Q1: swift支持的评测集有哪些?
@@ -305,5 +386,17 @@ swift eval --model_type 'qwen2_5-1_5b-instruct' --eval_dataset no --custom_eval_
305386
```
306387
这是依赖了nltk的包,然后nltk的tokenizer需要下载一个punkt_tab的zip文件,国内有些环境下载不太稳定或者直接失败。已尝试改了代码做兜底,规避这个问题;参考[issue](https://github.com/nltk/nltk/issues/3293)
307388

308-
### Q6: eval微调后的模型,总是会在固定的百分比停掉,但是vllm服务看着一直是有在正常运行的。模型越大,断开的越早。
389+
### Q6: eval微调后的模型,总是会在固定的百分比停掉,但是vllm服务看着一直是有在正常运行的。模型越大,断开的越早。
309390
`TIMEOUT`环境变量设置为-1。
391+
392+
### Q7: evalscope 支持多模型对比吗?
393+
详见[文档](https://evalscope.readthedocs.io/zh-cn/latest/user_guides/arena.html)
394+
395+
### Q8: 多模态数据集有没有自定义评估?
396+
多模态自定义评估可以参考[文档](https://evalscope.readthedocs.io/zh-cn/latest/advanced_guides/custom_dataset.html#vlm)
397+
398+
### Q9: ms-swift有方法测试qps,延迟,tokens/s吗?
399+
可以尝试使用evalscope的[模型压测工具](https://evalscope.readthedocs.io/zh-cn/latest/user_guides/stress_test.html#id1)
400+
401+
### Q10: 评估的时候可不可以控制数据集条数?评估一个mmlu需要一个多小时,也太慢了。
402+
配置参数`--eval_limit`,这里的`--eval_limit`是控制了每个subset的条数,比如mmlu有50多个subset,每个limit10条,那就是500多条。

docs/source_en/Instruction/Common-QA.md

Lines changed: 94 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -176,6 +176,43 @@ When fine-tuning on V100, it saves in fp32 format.
176176
### Q47: Multi-machine training speed is slow. When using the Swift framework for LLM training, we found that using DeepSpeed ZeRO-3 for training results in a severe speed decrease.
177177
See the details in this [issue](https://github.com/modelscope/ms-swift/issues/1825).
178178

179+
### Q48: Does swift currently support multi-stage pre-training for qwen2-vl? I noticed in the official best practices that the SFT seems to train VIT+LLM together. Is it possible to finetune separately?
180+
See [issue](https://github.com/modelscope/ms-swift/issues/2222) for details.
181+
182+
### Q49: Does qwen2-vl not support mixing pure text data?
183+
It supports both image-text and pure text data.
184+
185+
### Q50: Can we plot loss curves for different datasets during fine-tuning?
186+
No, it's not supported. Datasets are trained in a mixed manner.
187+
188+
### Q51: After model training, the responses contain a lot of repetitive content
189+
Refer to the [LLM Fine-tuning Documentation](https://swift.readthedocs.io/en/latest/Instruction/LLM-fine-tuning.html). If repetition occurs during training, try training for more epochs, clean the data, use full parameter training, or adopt RLHF methods to mitigate the issue.
190+
191+
### Q52: I want to ask if swift currently supports prompt tuning or prefix tuning?
192+
Not supported. These two methods suffer from severe knowledge forgetting, so they are not recommended for use at present.
193+
194+
### Q53: Training on two A10 GPUs reports the following error:
195+
```text
196+
[rank0]: torch.distributed.DistBackendError: NCCL error in:../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1970, unhandled system error (run with NCCL_DEBUG=INFO for details),NCCL version 2.20.5
197+
[rank0]:ncclSystemError: System call (e.g. socket,malloc) or external library call failed or device error.
198+
```
199+
Please check if the shared memory is too small, as NCCL requires shared memory.
200+
201+
### Q54: How to solve the problem of some parameters not participating in gradient backpropagation when freezing certain layers during DDP fine-tuning training?
202+
Configure the parameter `--ddp_find_unused_parameters true`.
203+
204+
### Q55: Does swift have a dataset quality inspection tool?
205+
[data-juicer](https://github.com/modelscope/data-juicer)
206+
207+
### Q55: Where can I start model parallelism on the web-ui? I only found the checkbox for data parallelism, but couldn't find where to enable model parallelism.
208+
Just specify the visible GPUs.
209+
210+
### Q56: How to turn off automatic shuffling? I want to disable it.
211+
Currently, you can only modify the transformers [code](https://github.com/huggingface/transformers/blob/main/src/transformers/trainer.py).
212+
213+
### Q57: What is the 'num_items_in_batch' parameter? I can't find it anywhere.
214+
Upgrade to `ms-swift==2.5.2` or downgrade to `transformers<4.46`.
215+
179216
## Inference
180217

181218
### Q1:Is there documentation for Swift inference?
@@ -224,6 +261,40 @@ Modify `generation_config.output_logits`. Set `model.generation_config.output_lo
224261
### Q14: Has anyone encountered this problem? RuntimeError: "triu_tril_cuda_template" not implemented for 'BFloat16'
225262
Upgrade torch, this version of torch hasn't implemented this operator.
226263

264+
### Q15: Does qwen2-audio support streaming inference?
265+
Yes, it does. For details, see this [issue](https://github.com/modelscope/ms-swift/issues/1653).
266+
267+
### Q16: Where do I set do_sample for multimodal model inference in the inference client?
268+
Set temperature=0.
269+
270+
### Q17: Does ms-swift support batch processing for large models?
271+
Yes, it does. When inferencing with Python scripts, the request_list in the documentation can contain multiple queries. During deployment, the server will automatically handle batch processing. See [VLLM Inference Acceleration and Deployment](https://swift.readthedocs.io/en/latest/LLM/VLLM-inference-acceleration-and-deployment.html) for details.
272+
273+
### Q18: When quantizing models in ms-swift, it shows insufficient memory. Is it possible to use fewer resources during quantization?
274+
Try setting `--quant_device_map cpu`.
275+
276+
### Q19: Does swift support quantization for multimodal models?
277+
Yes, it does.
278+
279+
### Q20:I'm getting the following error when using GPTQ. What's the reason?
280+
```text
281+
if llm_config['architectures'][0] == 'LlamaForCausalLM':
282+
KeyError: 'architectures'
283+
```
284+
Try using transformers version 4.44.*.
285+
286+
### Q21: How can I save the evaluation results to a specified file in swift infer? I never know where it's being saved.
287+
Set `--result_dir your_path`. See [InferArguments](https://github.com/modelscope/ms-swift/blob/main/swift/llm/utils/argument.py) for details.
288+
289+
### Q22: I'm getting the following error when AWQ quantizing yi-vl-6b:
290+
```text
291+
TypeError: swift.llm.utils.model.get_model_tokenizer_with_flash_attn() got multiple values for keyword argument 'automodel_class'.
292+
```
293+
Please use gptq quantization instead.
294+
295+
### Q23: I'm trying to use swift export for gptq int4 quantization of the qwen2.5 72B model, with max model length=32768 as the default value and a calibration dataset of 128 samples. However, I'm getting an error during quantization. The error log says: "factorization could not be completed because the input is not positive-definite (the leading minor of order 18145 is not positive-definite)". What's the reason?
296+
This is an issue with the Hessian matrix not being positive-definite. Try using a different dataset.
297+
227298
## Deployment
228299

229300
### Q1: How to deploy the trained model?
@@ -256,6 +327,16 @@ Inference settings can only be set before startup. For deployment, default setti
256327
### Q10: When deploying the qwen2vl model locally with vllm as the inference backend, how can we input local videos? Can we use base64 encoding? How to load videos when using curl?
257328
You can refer to the [Mutlimoda LLM Deployment](https://swift.readthedocs.io/en/latest/Multi-Modal/mutlimodal-deployment.html). URL, base64, and local file paths are all acceptable. Local file paths are only for testing on the same machine.
258329

330+
### Q11: When deploying qwen2-vl, the following error occurs. Is it due to an incorrect version of vllm?
331+
```text
332+
Unrecognized keys in `rope_scaling`for 'rope_type'='default': {'mrope_section'} Unrecognized keys in `rope_scaling`for 'rope_type'='default': {'mrope_section'}
333+
```
334+
See [issue](https://github.com/QwenLM/Qwen2-VL/issues/209) for details.
335+
336+
### Q12: Can swift inference output prediction probabilities? How to set it up during deployment?
337+
For Python script inference, `use model.generation_config.output_logits = True, model.generation_config.return_dict_in_generate = True`.
338+
During deployment, pass parameters from the client: `logprobs=True, top_logprobs=5`.
339+
259340
## Evaluation
260341

261342
### Q1: What evaluation datasets does Swift support?
@@ -305,5 +386,17 @@ swift eval --model_type 'qwen2_5-1_5b-instruct' --eval_dataset no --custom_eval_
305386
```
306387
This relies on the nltk package, and the nltk tokenizer needs to download a punkt_tab zip file, which can be unstable or fail directly in some environments in China. We have tried to modify the code to work around this issue; refer to this [issue](https://github.com/nltk/nltk/issues/3293).
307388

308-
### Q6: When evaluating a fine-tuned model, it always stops at a fixed percentage, but the vllm service seems to be running normally. The larger the model, the earlier it disconnects.
389+
### Q6: When evaluating a fine-tuned model, it always stops at a fixed percentage, but the vllm service seems to be running normally. The larger the model, the earlier it disconnects.
309390
Set the `TIMEOUT` environment variable to -1.
391+
392+
### Q7: Does evalscope support multi-model comparison?
393+
Please refer to the [documentation](https://evalscope.readthedocs.io/zh-cn/latest/user_guides/arena.html) for details.
394+
395+
### Q8: Is there custom evaluation for multimodal datasets?
396+
For multimodal custom evaluation, please refer to the [documentation](https://evalscope.readthedocs.io/zh-cn/latest/advanced_guides/custom_dataset.html#vlm).
397+
398+
### Q9: Does ms-swift have methods to test QPS, latency, and tokens/s?
399+
You can try using evalscope's [model stress testing tool](https://evalscope.readthedocs.io/zh-cn/latest/user_guides/stress_test.html#id1).
400+
401+
### Q10: Is it possible to control the number of dataset entries during evaluation? Evaluating one MMLU takes over an hour, which is too slow.
402+
Configure the parameter `--eval_limit`. Here, `--eval_limit` controls the number of entries for each subset. For example, if MMLU has over 50 subsets and each is limited to 10 entries, it would be over 500 entries in total.

swift/llm/export.py

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,22 @@ def _get_dataset(*args, **kwargs):
8181
return res
8282

8383

84+
@contextmanager
85+
def _patch_move_embed(awq_model):
86+
_origin_move_embed = awq_model.move_embed
87+
88+
def _move_embed(model, device: str):
89+
if hasattr(model, '_hf_hook') and device != 'cpu':
90+
return
91+
_origin_move_embed(model, device)
92+
93+
awq_model.move_embed = _move_embed
94+
try:
95+
yield
96+
finally:
97+
awq_model.move_embed = _origin_move_embed
98+
99+
84100
def awq_model_quantize(awq_model, tokenizer, batch_size) -> None:
85101

86102
from awq.quantize import quantizer
@@ -93,7 +109,8 @@ def awq_model_quantize(awq_model, tokenizer, batch_size) -> None:
93109
group_size = 128
94110
quant_config = {'zero_point': True, 'q_group_size': group_size, 'w_bit': _args.quant_bits, 'version': 'GEMM'}
95111
logger.info('Start quantizing the model...')
96-
awq_model.quantize(tokenizer, quant_config=quant_config, n_parallel_calib_samples=batch_size)
112+
with _patch_move_embed(awq_model):
113+
awq_model.quantize(tokenizer, quant_config=quant_config, n_parallel_calib_samples=batch_size)
97114
quantizer.get_calib_dataset = _origin_get_calib_dataset # recover
98115
awq_model.model.config.quantization_config = AwqConfig(
99116
bits=_args.quant_bits, group_size=group_size, zero_point=True, version='GEMM')
@@ -260,6 +277,7 @@ def llm_export(args: ExportArguments) -> None:
260277
from awq import AutoAWQForCausalLM
261278
model, template = prepare_model_template(
262279
args, device_map=args.quant_device_map, task='export', automodel_class=AutoAWQForCausalLM)
280+
template.model = model.model
263281
awq_model_quantize(model, template.tokenizer, args.quant_batch_size)
264282
model.save_quantized(args.quant_output_dir)
265283
elif args.quant_method == 'gptq':

swift/llm/utils/utils.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -317,7 +317,7 @@ def _single_map(d: Dict[str, Any], map_func: MapFunc) -> Optional[Dict[str, Any]
317317

318318
def _map_mp_single(shard: HfDataset, map_func: MapFunc, queue: Queue, rank: int):
319319
batch_size = 64
320-
pre_i = 0
320+
pre_i = -1
321321
result = []
322322
for i, d in enumerate(shard):
323323
output = map_func(d)
@@ -336,7 +336,7 @@ def _map_mp_i(dataset: HfDataset, map_func: MapFunc, num_proc: int) -> Iterator[
336336
os.environ = pre_environ
337337
queue = manager.Queue()
338338
async_results = []
339-
shard_list = [dataset.shard(num_proc, i) for i in range(num_proc)]
339+
shard_list = [dataset.shard(num_proc, i, contiguous=True) for i in range(num_proc)]
340340
for i in range(num_proc):
341341
async_results.append(pool.apply_async(_map_mp_single, args=(shard_list[i], map_func, queue, i)))
342342
while True:
@@ -350,11 +350,12 @@ def _map_mp_i(dataset: HfDataset, map_func: MapFunc, num_proc: int) -> Iterator[
350350
def _map_mp(dataset: HfDataset, map_func: MapFunc, num_proc: int) -> List[Dict[str, Any]]:
351351
# Solving the unordered problem
352352
num_proc = min(num_proc, len(dataset))
353-
data_list = [[]] * num_proc
353+
data_list = [[] for _ in range(num_proc)]
354354
prog_bar = tqdm(total=len(dataset), desc=f'Map (num_proc={num_proc})', dynamic_ncols=True)
355355
for d in _map_mp_i(dataset, map_func, num_proc):
356356
data_list[d[0]] += d[1]
357357
prog_bar.update(d[2])
358+
prog_bar.close()
358359
res = []
359360
for data in data_list:
360361
res += data

swift/trainers/rlhf_trainer/kto_trainer.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ def _add_kl_dataset(dataset: LLMDataset, total_batch_size: int, seed: Optional[i
3737
'labels': data['labels'],
3838
'KL_input_ids': kl_input_ids,
3939
'KL_labels': kl_labels,
40-
'label': kl_data['label']
40+
'label': data['label']
4141
})
4242
raw_dataset[i:i + total_batch_size] = new_dataset_group
4343
i += total_batch_size

0 commit comments

Comments
 (0)