You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[rank0]: torch.distributed.DistBackendError: NCCL error in:../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1970, unhandled system error (run with NCCL_DEBUG=INFO for details),NCCL version 2.20.5
196
+
[rank0]:ncclSystemError: System call (e.g. socket,malloc) or external library call failed or device error.
TypeError: swift.llm.utils.model.get_model_tokenizer_with_flash_attn() got multiple values for keyword argument 'automodel_class'.
292
+
```
293
+
请使用gptq量化。
294
+
295
+
### Q23: 想问一下用swift export对qwen2.5 72B模型进行gptq int4量化,max model length=32768用的是默认值,给的校准数据集有128个样本,但是量化的时候报错了,报错日志是:factorization could not be completed because the input is not positive-definite(the leading minor of order 18145 is not pisitive-definite)。是什么原因?
Copy file name to clipboardExpand all lines: docs/source_en/Instruction/Common-QA.md
+94-1Lines changed: 94 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -176,6 +176,43 @@ When fine-tuning on V100, it saves in fp32 format.
176
176
### Q47: Multi-machine training speed is slow. When using the Swift framework for LLM training, we found that using DeepSpeed ZeRO-3 for training results in a severe speed decrease.
177
177
See the details in this [issue](https://github.com/modelscope/ms-swift/issues/1825).
178
178
179
+
### Q48: Does swift currently support multi-stage pre-training for qwen2-vl? I noticed in the official best practices that the SFT seems to train VIT+LLM together. Is it possible to finetune separately?
180
+
See [issue](https://github.com/modelscope/ms-swift/issues/2222) for details.
181
+
182
+
### Q49: Does qwen2-vl not support mixing pure text data?
183
+
It supports both image-text and pure text data.
184
+
185
+
### Q50: Can we plot loss curves for different datasets during fine-tuning?
186
+
No, it's not supported. Datasets are trained in a mixed manner.
187
+
188
+
### Q51: After model training, the responses contain a lot of repetitive content
189
+
Refer to the [LLM Fine-tuning Documentation](https://swift.readthedocs.io/en/latest/Instruction/LLM-fine-tuning.html). If repetition occurs during training, try training for more epochs, clean the data, use full parameter training, or adopt RLHF methods to mitigate the issue.
190
+
191
+
### Q52: I want to ask if swift currently supports prompt tuning or prefix tuning?
192
+
Not supported. These two methods suffer from severe knowledge forgetting, so they are not recommended for use at present.
193
+
194
+
### Q53: Training on two A10 GPUs reports the following error:
195
+
```text
196
+
[rank0]: torch.distributed.DistBackendError: NCCL error in:../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1970, unhandled system error (run with NCCL_DEBUG=INFO for details),NCCL version 2.20.5
197
+
[rank0]:ncclSystemError: System call (e.g. socket,malloc) or external library call failed or device error.
198
+
```
199
+
Please check if the shared memory is too small, as NCCL requires shared memory.
200
+
201
+
### Q54: How to solve the problem of some parameters not participating in gradient backpropagation when freezing certain layers during DDP fine-tuning training?
202
+
Configure the parameter `--ddp_find_unused_parameters true`.
203
+
204
+
### Q55: Does swift have a dataset quality inspection tool?
### Q55: Where can I start model parallelism on the web-ui? I only found the checkbox for data parallelism, but couldn't find where to enable model parallelism.
208
+
Just specify the visible GPUs.
209
+
210
+
### Q56: How to turn off automatic shuffling? I want to disable it.
211
+
Currently, you can only modify the transformers [code](https://github.com/huggingface/transformers/blob/main/src/transformers/trainer.py).
212
+
213
+
### Q57: What is the 'num_items_in_batch' parameter? I can't find it anywhere.
214
+
Upgrade to `ms-swift==2.5.2` or downgrade to `transformers<4.46`.
215
+
179
216
## Inference
180
217
181
218
### Q1:Is there documentation for Swift inference?
@@ -224,6 +261,40 @@ Modify `generation_config.output_logits`. Set `model.generation_config.output_lo
224
261
### Q14: Has anyone encountered this problem? RuntimeError: "triu_tril_cuda_template" not implemented for 'BFloat16'
225
262
Upgrade torch, this version of torch hasn't implemented this operator.
226
263
264
+
### Q15: Does qwen2-audio support streaming inference?
265
+
Yes, it does. For details, see this [issue](https://github.com/modelscope/ms-swift/issues/1653).
266
+
267
+
### Q16: Where do I set do_sample for multimodal model inference in the inference client?
268
+
Set temperature=0.
269
+
270
+
### Q17: Does ms-swift support batch processing for large models?
271
+
Yes, it does. When inferencing with Python scripts, the request_list in the documentation can contain multiple queries. During deployment, the server will automatically handle batch processing. See [VLLM Inference Acceleration and Deployment](https://swift.readthedocs.io/en/latest/LLM/VLLM-inference-acceleration-and-deployment.html) for details.
272
+
273
+
### Q18: When quantizing models in ms-swift, it shows insufficient memory. Is it possible to use fewer resources during quantization?
274
+
Try setting `--quant_device_map cpu`.
275
+
276
+
### Q19: Does swift support quantization for multimodal models?
277
+
Yes, it does.
278
+
279
+
### Q20:I'm getting the following error when using GPTQ. What's the reason?
280
+
```text
281
+
if llm_config['architectures'][0] == 'LlamaForCausalLM':
282
+
KeyError: 'architectures'
283
+
```
284
+
Try using transformers version 4.44.*.
285
+
286
+
### Q21: How can I save the evaluation results to a specified file in swift infer? I never know where it's being saved.
287
+
Set `--result_dir your_path`. See [InferArguments](https://github.com/modelscope/ms-swift/blob/main/swift/llm/utils/argument.py) for details.
288
+
289
+
### Q22: I'm getting the following error when AWQ quantizing yi-vl-6b:
290
+
```text
291
+
TypeError: swift.llm.utils.model.get_model_tokenizer_with_flash_attn() got multiple values for keyword argument 'automodel_class'.
292
+
```
293
+
Please use gptq quantization instead.
294
+
295
+
### Q23: I'm trying to use swift export for gptq int4 quantization of the qwen2.5 72B model, with max model length=32768 as the default value and a calibration dataset of 128 samples. However, I'm getting an error during quantization. The error log says: "factorization could not be completed because the input is not positive-definite (the leading minor of order 18145 is not positive-definite)". What's the reason?
296
+
This is an issue with the Hessian matrix not being positive-definite. Try using a different dataset.
297
+
227
298
## Deployment
228
299
229
300
### Q1: How to deploy the trained model?
@@ -256,6 +327,16 @@ Inference settings can only be set before startup. For deployment, default setti
256
327
### Q10: When deploying the qwen2vl model locally with vllm as the inference backend, how can we input local videos? Can we use base64 encoding? How to load videos when using curl?
257
328
You can refer to the [Mutlimoda LLM Deployment](https://swift.readthedocs.io/en/latest/Multi-Modal/mutlimodal-deployment.html). URL, base64, and local file paths are all acceptable. Local file paths are only for testing on the same machine.
258
329
330
+
### Q11: When deploying qwen2-vl, the following error occurs. Is it due to an incorrect version of vllm?
331
+
```text
332
+
Unrecognized keys in `rope_scaling`for 'rope_type'='default': {'mrope_section'} Unrecognized keys in `rope_scaling`for 'rope_type'='default': {'mrope_section'}
333
+
```
334
+
See [issue](https://github.com/QwenLM/Qwen2-VL/issues/209) for details.
335
+
336
+
### Q12: Can swift inference output prediction probabilities? How to set it up during deployment?
During deployment, pass parameters from the client: `logprobs=True, top_logprobs=5`.
339
+
259
340
## Evaluation
260
341
261
342
### Q1: What evaluation datasets does Swift support?
@@ -305,5 +386,17 @@ swift eval --model_type 'qwen2_5-1_5b-instruct' --eval_dataset no --custom_eval_
305
386
```
306
387
This relies on the nltk package, and the nltk tokenizer needs to download a punkt_tab zip file, which can be unstable or fail directly in some environments in China. We have tried to modify the code to work around this issue; refer to this [issue](https://github.com/nltk/nltk/issues/3293).
307
388
308
-
### Q6: When evaluating a fine-tuned model, it always stops at a fixed percentage, but the vllm service seems to be running normally. The larger the model, the earlier it disconnects.
389
+
### Q6: When evaluating a fine-tuned model, it always stops at a fixed percentage, but the vllm service seems to be running normally. The larger the model, the earlier it disconnects.
309
390
Set the `TIMEOUT` environment variable to -1.
391
+
392
+
### Q7: Does evalscope support multi-model comparison?
393
+
Please refer to the [documentation](https://evalscope.readthedocs.io/zh-cn/latest/user_guides/arena.html) for details.
394
+
395
+
### Q8: Is there custom evaluation for multimodal datasets?
396
+
For multimodal custom evaluation, please refer to the [documentation](https://evalscope.readthedocs.io/zh-cn/latest/advanced_guides/custom_dataset.html#vlm).
397
+
398
+
### Q9: Does ms-swift have methods to test QPS, latency, and tokens/s?
399
+
You can try using evalscope's [model stress testing tool](https://evalscope.readthedocs.io/zh-cn/latest/user_guides/stress_test.html#id1).
400
+
401
+
### Q10: Is it possible to control the number of dataset entries during evaluation? Evaluating one MMLU takes over an hour, which is too slow.
402
+
Configure the parameter `--eval_limit`. Here, `--eval_limit` controls the number of entries for each subset. For example, if MMLU has over 50 subsets and each is limited to 10 entries, it would be over 500 entries in total.
0 commit comments