modelscope
diff --git a/‎docs/source/LLM/Grok训练和推理.md‎
Lines changed: 2 additions & 3 deletions b/‎docs/source/LLM/Grok训练和推理.md‎
Lines changed: 2 additions & 3 deletions
diff --git a/‎docs/source/LLM/命令行参数.md‎
Lines changed: 2 additions & 1 deletion b/‎docs/source/LLM/命令行参数.md‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎docs/source_en/LLM/Command-line-parameters.md‎
Lines changed: 1 addition & 0 deletions b/‎docs/source_en/LLM/Command-line-parameters.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/source_en/LLM/Grok-1-best-practice.md‎
Lines changed: 2 additions & 3 deletions b/‎docs/source_en/LLM/Grok-1-best-practice.md‎
Lines changed: 2 additions & 3 deletions
diff --git a/‎examples/pytorch/llm/scripts/baichuan2_13b_chat/lora_ddp_ds/sft.sh‎
Lines changed: 1 addition & 2 deletions b/‎examples/pytorch/llm/scripts/baichuan2_13b_chat/lora_ddp_ds/sft.sh‎
Lines changed: 1 addition & 2 deletions
diff --git a/‎examples/pytorch/llm/scripts/baichuan2_13b_chat/lora_mp/sft.sh‎
Lines changed: 1 addition & 1 deletion b/‎examples/pytorch/llm/scripts/baichuan2_13b_chat/lora_mp/sft.sh‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎examples/pytorch/llm/scripts/baichuan2_13b_chat/lora_mp_ddp/sft.sh‎
Lines changed: 1 addition & 1 deletion b/‎examples/pytorch/llm/scripts/baichuan2_13b_chat/lora_mp_ddp/sft.sh‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎examples/pytorch/llm/scripts/baichuan2_13b_chat/qlora_ddp_ds/sft.sh‎
Lines changed: 1 addition & 2 deletions b/‎examples/pytorch/llm/scripts/baichuan2_13b_chat/qlora_ddp_ds/sft.sh‎
Lines changed: 1 addition & 2 deletions
diff --git a/‎examples/pytorch/llm/scripts/baichuan2_13b_chat_int4/qlora_ddp_ds/sft.sh‎
Lines changed: 2 additions & 3 deletions b/‎examples/pytorch/llm/scripts/baichuan2_13b_chat_int4/qlora_ddp_ds/sft.sh‎
Lines changed: 2 additions & 3 deletions
diff --git a/‎examples/pytorch/llm/scripts/baichuan2_7b_chat/lora_ddp/sft.sh‎
Lines changed: 1 addition & 1 deletion b/‎examples/pytorch/llm/scripts/baichuan2_7b_chat/lora_ddp/sft.sh‎
Lines changed: 1 addition & 1 deletion
@@ -72,11 +72,10 @@ torchrun \
     --save_steps 100 \
     --save_total_limit 2 \
     --logging_steps 10 \
-    --deepspeed_config_path scripts/grok-1/lora_ddp_ds/zero3.json \
-    --save_only_model true \
+    --deepspeed zero3-offload \
 ```
 
-改脚本需要一个zero3.json文件，完整的训练文件可以在[这里](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/grok-1/lora_ddp_ds)找到。
+完整的训练文件可以在[这里](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/grok-1/lora_ddp_ds)找到。
 
 下面是训练过程的一些benchmark：
 
 
@@ -94,6 +94,7 @@
 - `--save_on_each_node`: 该参数在多机训练时生效, 默认为`True`.
 - `--save_strategy`: 保存checkpoint的策略, 默认为`'steps'`, 可选择的值包括: 'steps', 'no'.
 - `--save_safetensors`: 默认为`True`.
+- `--include_num_input_tokens_seen`: 默认为`False`. 跟踪整个训练过程中观察到的输入tokens的数量.
 - `--max_new_tokens`: 默认为`2048`. 该参数只有在`predict_with_generate`设置为True的时候才生效.
 - `--do_sample`: 默认为`True`. 该参数只有在`predict_with_generate`设置为True的时候才生效.
 - `--temperature`: 默认为`0.3`. 该参数只有在`predict_with_generate`设置为True的时候才生效.
@@ -209,7 +210,7 @@ dpo参数继承了sft参数, 除此之外增加了以下参数:
 - `--bnb_4bit_comp_dtype`: 默认值为`'AUTO'`.  具体的参数介绍可以在`sft.sh命令行参数`中查看. 若`quantization_bit`设置为0, 则该参数失效.
 - `--bnb_4bit_quant_type`: 默认值为`'nf4'`.  具体的参数介绍可以在`sft.sh命令行参数`中查看. 若`quantization_bit`设置为0, 则该参数失效.
 - `--bnb_4bit_use_double_quant`: 默认值为`True`.  具体的参数介绍可以在`sft.sh命令行参数`中查看. 若`quantization_bit`设置为0, 则该参数失效.
-- `--bnb_4bit_quant_storage`: 默认值为`True`.  具体的参数介绍可以在`sft.sh命令行参数`中查看. 若`quantization_bit`设置为0, 则该参数失效.
+- `--bnb_4bit_quant_storage`: 默认值为`True`. 具体的参数介绍可以在`sft.sh命令行参数`中查看. 若`quantization_bit`设置为0, 则该参数失效.
 - `--max_new_tokens`: 生成新token的最大数量, 默认值为`2048`.
 - `--do_sample`: 是使用贪婪生成的方式还是采样生成的方式, 默认值为`True`.
 - `--temperature`: 默认值为`0.3`. 该参数只有在`do_sample`设置为True时才生效. 该参数会在部署参数中作为默认值使用.
 
@@ -93,6 +93,7 @@
 - `--save_on_each_node`: Takes effect during multi-machine training, default is `True`.
 - `--save_strategy`: Strategy for saving checkpoint, default is `'steps'`, options include: 'steps', 'no'.
 - `--save_safetensors`: Default is `True`.
+- `--include_num_input_tokens_seen`: Default is `False`. Tracks the number of input tokens seen throughout training.
 - `--max_new_tokens`: Default is `2048`. This parameter only takes effect when `predict_with_generate` is set to True.
 - `--do_sample`: Default is `True`. This parameter only takes effect when `predict_with_generate` is set to True.
 - `--temperature`: Default is `0.3`. This parameter only takes effect when `do_sample` is set to True. This parameter will be used as default value in deployment parameters.
 
@@ -70,11 +70,10 @@ torchrun \
     --save_steps 100 \
     --save_total_limit 2 \
     --logging_steps 10 \
-    --deepspeed_config_path scripts/grok-1/lora_ddp_ds/zero3.json \
-    --save_only_model true \
+    --deepspeed zero3-offload \
 ```
 
-This script requires a zero3.json file. The complete training files can be found [here](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/grok-1/lora_ddp_ds).
+The complete training files can be found [here](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/grok-1/lora_ddp_ds).
 
 Here are some benchmarks from the training process:
 
 
@@ -12,7 +12,7 @@ torchrun \
     --model_revision master \
     --sft_type lora \
     --tuner_backend peft \
-    --template_type baichuan \
+    --template_type AUTO \
     --dtype AUTO \
     --output_dir output \
     --ddp_backend nccl \
@@ -37,4 +37,3 @@ torchrun \
     --save_total_limit 2 \
     --logging_steps 10 \
     --deepspeed default-zero2 \
-    --save_only_model true \
@@ -7,7 +7,7 @@ swift sft \
     --model_revision master \
     --sft_type lora \
     --tuner_backend peft \
-    --template_type baichuan \
+    --template_type AUTO \
     --dtype AUTO \
     --output_dir output \
     --dataset dureader-robust-zh \
 
@@ -12,7 +12,7 @@ torchrun \
     --model_revision master \
     --sft_type lora \
     --tuner_backend peft \
-    --template_type baichuan \
+    --template_type AUTO \
     --dtype AUTO \
     --output_dir output \
     --ddp_backend nccl \
 
@@ -12,7 +12,7 @@ torchrun \
     --model_revision master \
     --sft_type lora \
     --tuner_backend peft \
-    --template_type baichuan \
+    --template_type AUTO \
     --dtype AUTO \
     --output_dir output \
     --ddp_backend nccl \
@@ -39,4 +39,3 @@ torchrun \
     --save_total_limit 2 \
     --logging_steps 10 \
     --deepspeed default-zero2 \
-    --save_only_model true \
@@ -12,7 +12,7 @@ torchrun \
     --model_revision master \
     --sft_type lora \
     --tuner_backend peft \
-    --template_type baichuan \
+    --template_type AUTO \
     --dtype AUTO \
     --output_dir output \
     --ddp_backend nccl \
@@ -40,5 +40,4 @@ torchrun \
     --hub_model_id baichuan2-13b-chat-int4-qlora \
     --hub_private_repo true \
     --hub_token 'your-sdk-token' \
-    --deepspeed_config_path default-zero2 \
-    --save_only_model true \
+    --deepspeed default-zero2 \
@@ -12,7 +12,7 @@ torchrun \
     --model_revision master \
     --sft_type lora \
     --tuner_backend peft \
-    --template_type baichuan \
+    --template_type AUTO \
     --dtype AUTO \
     --output_dir output \
     --ddp_backend nccl \