fix sh ddp_backend (#1360)

Jintao-Huang · web-flow · commit 2d07f9fc2510 · 2024-07-11T12:36:00.000+08:00
diff --git a/docs/source/LLM/Agent微调最佳实践.md b/docs/source/LLM/Agent微调最佳实践.md
@@ -165,7 +165,7 @@ Final Answer: 如果您想要一款拍照表现出色的手机，我为您推荐
 | ms-bench         | 60000(抽样)     |
 | self-recognition | 3000(重复抽样)  |
 
-我们也支持使用自己的Agent数据集。数据集格式需要符合[自定义数据集](https://github.com/modelscope/swift/blob/main/docs/source/LLM/%E8%87%AA%E5%AE%9A%E4%B9%89%E4%B8%8E%E6%8B%93%E5%B1%95.md#%E8%87%AA%E5%AE%9A%E4%B9%89%E6%95%B0%E6%8D%AE%E9%9B%86)的要求。更具体地，Agent的response/system应该符合上述的Action/Action Input/Observation格式。
+我们也支持使用自己的Agent数据集。数据集格式需要符合[自定义数据集](%E8%87%AA%E5%AE%9A%E4%B9%89%E4%B8%8E%E6%8B%93%E5%B1%95.md#%E8%87%AA%E5%AE%9A%E4%B9%89%E6%95%B0%E6%8D%AE%E9%9B%86)的要求。更具体地，Agent的response/system应该符合上述的Action/Action Input/Observation格式。
 
 我们将**MLP**和**Embedder**加入了lora_target_modules. 你可以通过指定`--lora_target_modules ALL`在所有的linear层(包括qkvo以及mlp和embedder)加lora. 这**通常是效果最好的**.
 
diff --git a/docs/source/LLM/LLM微调文档.md b/docs/source/LLM/LLM微调文档.md
@@ -37,7 +37,7 @@ pip install -r requirements/llm.txt  -U
 ```
 
 ## 微调
-如果你要使用界面的方式进行微调与推理, 可以查看[界面训练与推理文档](https://github.com/modelscope/swift/blob/main/docs/source/GetStarted/%E7%95%8C%E9%9D%A2%E8%AE%AD%E7%BB%83%E6%8E%A8%E7%90%86.md).
+如果你要使用界面的方式进行微调与推理, 可以查看[界面训练与推理文档](../GetStarted/%E7%95%8C%E9%9D%A2%E8%AE%AD%E7%BB%83%E6%8E%A8%E7%90%86.md).
 
 ### 使用python
 ```python
diff --git a/docs/source/LLM/LLM量化文档.md b/docs/source/LLM/LLM量化文档.md
@@ -305,7 +305,7 @@ CUDA_VISIBLE_DEVICES=0 swift sft \
 ```
 
 **注意**
-- hqq支持更多自定义参数，比如为不同网络层指定不同量化配置，具体请见[命令行参数](https://github.com/modelscope/swift/blob/main/docs/source/LLM/命令行参数.md)
+- hqq支持更多自定义参数，比如为不同网络层指定不同量化配置，具体请见[命令行参数](命令行参数.md)
 - eetq量化为8bit量化，无需指定quantization_bit。目前不支持bf16，需要指定dtype为fp16
 - eetq目前qlora速度比较慢，推荐使用hqq。参考[issue](https://github.com/NetEase-FuXi/EETQ/issues/17)
 
diff --git a/docs/source/LLM/index.md b/docs/source/LLM/index.md
@@ -5,7 +5,7 @@
 1. [LLM推理文档](LLM推理文档.md)
 2. [LLM微调文档](LLM微调文档.md)
 3. [DPO训练文档](DPO训练文档.md)
-4. [界面训练与推理](https://github.com/modelscope/swift/blob/main/docs/source/GetStarted/%E7%95%8C%E9%9D%A2%E8%AE%AD%E7%BB%83%E6%8E%A8%E7%90%86.md)
+4. [界面训练与推理](../GetStarted/%E7%95%8C%E9%9D%A2%E8%AE%AD%E7%BB%83%E6%8E%A8%E7%90%86.md)
 5. [LLM评测文档](LLM评测文档.md)
 6. [LLM量化文档](LLM量化文档.md)
 7. [VLLM推理加速与部署](VLLM推理加速与部署.md)
diff --git a/docs/source/LLM/自我认知微调最佳实践.md b/docs/source/LLM/自我认知微调最佳实践.md
@@ -89,7 +89,7 @@ My name is Qianwen, which is also known as Tongyi Qianwen. I am a large-scale la
 如果尝试了上述方法后仍然无法改善睡眠问题，建议咨询医生或睡眠专家，以排除潜在的健康问题。
 """
 ```
-如果你要进行单样本推理, 可以参考[LLM推理文档](https://github.com/modelscope/swift/blob/main/docs/source/LLM/LLM%E6%8E%A8%E7%90%86%E6%96%87%E6%A1%A3.md#qwen-7b-chat)
+如果你要进行单样本推理, 可以参考[LLM推理文档](LLM%E6%8E%A8%E7%90%86%E6%96%87%E6%A1%A3.md#qwen-7b-chat)
 
 使用CLI:
 ```bash
diff --git a/docs/source_en/LLM/LLM-quantization.md b/docs/source_en/LLM/LLM-quantization.md
@@ -218,7 +218,7 @@ CUDA_VISIBLE_DEVICES=0 swift sft \
 ```
 
 **Note**
-- hqq supports more customizable parameters, such as specifying different quantization configurations for different network layers. For details, please see [Command Line Arguments](https://github.com/modelscope/swift/blob/main/docs/source_en/LLM/Command-line-parameters.md).
+- hqq supports more customizable parameters, such as specifying different quantization configurations for different network layers. For details, please see [Command Line Arguments](Command-line-parameters.md).
 - eetq quantization uses 8-bit quantization, and there's no need to specify quantization_bit. Currently, bf16 is not supported; you need to specify dtype as fp16.
 - Currently, eetq's qlora speed is relatively slow; it is recommended to use hqq instead. For reference, see the [issue](https://github.com/NetEase-FuXi/EETQ/issues/17).
 
diff --git a/examples/pytorch/llm/scripts/atom_7b_chat/lora/sft.sh b/examples/pytorch/llm/scripts/atom_7b_chat/lora/sft.sh
@@ -8,7 +8,6 @@ swift sft \
     --tuner_backend peft \
     --dtype AUTO \
     --output_dir output \
-    --ddp_backend nccl \
     --dataset ms-bench \
     --num_train_epochs 3 \
     --max_length 2048 \
diff --git a/examples/pytorch/llm/scripts/codeqwen1half_7b_chat/lora/sft.sh b/examples/pytorch/llm/scripts/codeqwen1half_7b_chat/lora/sft.sh
@@ -8,7 +8,6 @@ swift sft \
     --tuner_backend peft \
     --dtype AUTO \
     --output_dir output \
-    --ddp_backend nccl \
     --dataset leetcode-python-en \
     --num_train_epochs 3 \
     --max_length 2048 \
diff --git a/examples/pytorch/llm/scripts/dbrx-instruct/lora_mp/sft.sh b/examples/pytorch/llm/scripts/dbrx-instruct/lora_mp/sft.sh
@@ -9,7 +9,6 @@ swift sft \
     --template_type AUTO \
     --dtype bf16 \
     --output_dir output \
-    --ddp_backend nccl \
     --dataset blossom-math-zh \
     --num_train_epochs 1 \
     --max_length 1024 \
diff --git a/examples/pytorch/llm/scripts/deepseek-v2-chat/lora_mp/sft.sh b/examples/pytorch/llm/scripts/deepseek-v2-chat/lora_mp/sft.sh
@@ -7,7 +7,6 @@ swift sft \
     --tuner_backend peft \
     --dtype bf16 \
     --output_dir output \
-    --ddp_backend nccl \
     --dataset alpaca-zh#5000 \
     --num_train_epochs 1 \
     --max_length 1024 \
diff --git a/examples/pytorch/llm/scripts/internlm2_7b_sft_chat/lora/sft.sh b/examples/pytorch/llm/scripts/internlm2_7b_sft_chat/lora/sft.sh
@@ -9,7 +9,6 @@ swift sft \
     --template_type AUTO \
     --dtype AUTO \
     --output_dir output \
-    --ddp_backend nccl \
     --dataset dureader-robust-zh \
     --train_dataset_sample 20000 \
     --num_train_epochs 1 \
diff --git a/swift/llm/utils/utils.py b/swift/llm/utils/utils.py
@@ -558,7 +558,8 @@ def _prepare_inputs(model: PreTrainedModel,
                     system: Optional[str] = None,
                     images: Optional[List[str]] = None,
                     *,
-                    generation_config: Optional[GenerationConfig] = None,
+                    generation_config: GenerationConfig,
+                    generation_info: Dict[str, Any],
                     stop_words: Optional[StopWords] = None,
                     adapter_names: Optional[List[str]] = None,
                     **kwargs) -> Tuple[Dict[str, Any], Dict[str, Any], int, Dict[str, Any]]:
@@ -599,9 +600,6 @@ def _prepare_inputs(model: PreTrainedModel,
     if 'token_type_ids' in inputs:
         inputs['token_type_ids'] = torch.tensor(inputs['token_type_ids'])[None]
     model.eval()
-    if generation_config is None:
-        generation_config = getattr(model, 'generation_config')
-    generation_config = deepcopy(generation_config)
 
     if tokenizer.eos_token_id is not None:
         generation_config.eos_token_id = tokenizer.eos_token_id
@@ -626,7 +624,7 @@ def _prepare_inputs(model: PreTrainedModel,
 
     stopping_criteria = StoppingCriteriaList([StopWordsCriteria(tokenizer, stop_words, **tokenizer_kwargs)])
     inputs['stopping_criteria'] = stopping_criteria
-    inputs['generation_config'] = generation_config
+    generation_info['num_prompt_tokens'] = token_len
     return inputs, tokenizer_kwargs, token_len, example
 
 
@@ -651,6 +649,13 @@ def inference_stream(model: PreTrainedModel,
         history = []
     else:
         history = deepcopy(history)
+    if generation_config is None:
+        generation_config = getattr(model, 'generation_config')
+    generation_config = deepcopy(generation_config)
+    if generation_info is None:
+        generation_info = {}
+    else:
+        generation_info.clear()
     inputs, tokenizer_kwargs, token_len, example = _prepare_inputs(
         model,
         template,
@@ -659,16 +664,12 @@ def inference_stream(model: PreTrainedModel,
         system,
         images,
         generation_config=generation_config,
+        generation_info=generation_info,
         stop_words=stop_words,
         adapter_names=adapter_names,
         **kwargs)
     if len(inputs) == 0:
         return '', history
-    if generation_info is None:
-        generation_info = {}
-    else:
-        generation_info.clear()
-    generation_info['num_prompt_tokens'] = token_len
 
     # agent support
     is_observation = history[-1][-1].endswith('Observation:') if history and history[-1][-1] else False
@@ -677,13 +678,12 @@ def inference_stream(model: PreTrainedModel,
         act_length = len(history[-1][-1])
         query = None
 
-    generation_config = inputs['generation_config']
     if generation_config.num_beams != 1:
         error_msg = 'Streaming generation does not support beam search.'
         raise ValueError(error_msg)
 
     streamer = TokenListIteratorStreamer()
-    generation_kwargs = {'streamer': streamer, **inputs}
+    generation_kwargs = {'streamer': streamer, 'generation_config': generation_config, **inputs}
     _model_generate = model.generate
     if is_torch_npu_available():
 
@@ -753,6 +753,13 @@ def inference(model: PreTrainedModel,
         history = []
     else:
         history = deepcopy(history)
+    if generation_config is None:
+        generation_config = getattr(model, 'generation_config')
+    generation_config = deepcopy(generation_config)
+    if generation_info is None:
+        generation_info = {}
+    else:
+        generation_info.clear()
     inputs, tokenizer_kwargs, token_len, example = _prepare_inputs(
         model,
         template,
@@ -761,16 +768,12 @@ def inference(model: PreTrainedModel,
         system,
         images,
         generation_config=generation_config,
+        generation_info=generation_info,
         stop_words=stop_words,
         adapter_names=adapter_names,
         **kwargs)
     if len(inputs) == 0:
         return '', history
-    if generation_info is None:
-        generation_info = {}
-    else:
-        generation_info.clear()
-    generation_info['num_prompt_tokens'] = token_len
 
     # agent support
     is_observation = history[-1][-1].endswith('Observation:') if history and history[-1][-1] else False
@@ -794,7 +797,7 @@ def inference(model: PreTrainedModel,
         else:
             print(f'[QUERY]{query}\n{output_prefix}', end='')
 
-    generate_ids = model.generate(streamer=streamer, **inputs)
+    generate_ids = model.generate(streamer=streamer, generation_config=generation_config, **inputs)
     generate_ids = template.get_generate_ids(generate_ids, token_len)
     generation_info['num_generated_tokens'] = len(generate_ids)
     if verbose and stream is False: