support yi-34b-chat (#164)

Jintao-Huang · web-flow · commit 2180a0865fe9 · 2023-11-24T19:41:03.000+08:00
diff --git a/README.md b/README.md
@@ -41,6 +41,7 @@ Users can check the [documentation of SWIFT](docs/source/GetStarted/快速使用
 
 
 ### 🎉 News
+- 🔥 2023.11.24: Support for **yi-34b-chat**, **codefuse-codellama-34b-chat**: The corresponding shell script can be found in [yi_34b_chat](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/yi_34b_chat), [codefuse_codellama_34b_chat](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/codefuse_codellama_34b_chat).
 - 🔥 2023.11.18: Support for **tongyi-finance-14b** series models: tongyi-finance-14b, tongyi-finance-14b-chat, tongyi-finance-14b-chat-int4. The corresponding shell script can be found in [tongyi_finance_14b_chat_int4](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/tongyi_finance_14b_chat_int4).
 - 🔥 2023.11.16: Added support for more models in **flash attn**: qwen series, qwen-vl series, llama series, openbuddy series, mistral series, yi series, ziya series. Please use the `use_flash_attn` parameter.
 - 🔥 2023.11.11: **NEFTune** Supported, Use is with `Swift.prepare_model(model, NEFTuneConfig())`
@@ -82,7 +83,7 @@ Users can refer to the [LLM fine-tuning documentation](https://github.com/models
   - xverse series: [xverse-7b](https://modelscope.cn/models/xverse/XVERSE-7B/summary), [xverse-7b-chat](https://modelscope.cn/models/xverse/XVERSE-7B-Chat/summary), [xverse-13b](https://modelscope.cn/models/xverse/XVERSE-13B/summary), [xverse-13b-chat](https://modelscope.cn/models/xverse/XVERSE-13B-Chat/summary), [xverse-65b](https://modelscope.cn/models/xverse/XVERSE-65B/summary)
   - bluelm series: [bluelm-7b](https://modelscope.cn/models/vivo-ai/BlueLM-7B-Base/summary), [bluelm-7b-chat](https://modelscope.cn/models/vivo-ai/BlueLM-7B-Chat/summary), [bluelm-7b-32k](https://modelscope.cn/models/vivo-ai/BlueLM-7B-Base-32K/summary), [bluelm-7b-chat-32k](https://modelscope.cn/models/vivo-ai/BlueLM-7B-Chat-32K/summary)
   - mistral series: [mistral-7b](https://modelscope.cn/models/AI-ModelScope/Mistral-7B-v0.1/summary), [mistral-7b-chat](https://modelscope.cn/models/AI-ModelScope/Mistral-7B-Instruct-v0.1/summary)
-  - yi series: [yi-6b](https://modelscope.cn/models/01ai/Yi-6B/summary), [yi-34b](https://modelscope.cn/models/01ai/Yi-34B/summary)
+  - yi series: [yi-6b](https://modelscope.cn/models/01ai/Yi-6B/summary), [yi-34b](https://modelscope.cn/models/01ai/Yi-34B/summary), [yi-34b-chat](https://modelscope.cn/models/01ai/Yi-34B-Chat/summary)
   - ziya series: [ziya2-13b](https://modelscope.cn/models/Fengshenbang/Ziya2-13B-Base/summary), [ziya2-13b-chat](https://modelscope.cn/models/Fengshenbang/Ziya2-13B-Chat/summary)
   - skywork series: [skywork-13b](https://modelscope.cn/models/skywork/Skywork-13B-base/summary), [skywork-13b-chat](https://modelscope.cn/models/skywork/Skywork-13B-chat/summary)
   - other: [polylm-13b](https://modelscope.cn/models/damo/nlp_polylm_13b_text_generation/summary), [seqgpt-560m](https://modelscope.cn/models/damo/nlp_seqgpt-560m/summary)
diff --git a/README_CN.md b/README_CN.md
@@ -39,6 +39,7 @@ SWIFT（Scalable lightWeight Infrastructure for Fine-Tuning）是一个可扩展
 
 
 ## 🎉 新闻
+- 🔥 2023.11.24: 支持**yi-34b-chat**, **codefuse-codellama-34b-chat**模型. 对应的sh脚本可以查看[yi_34b_chat](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/yi_34b_chat), [codefuse_codellama_34b_chat](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/codefuse_codellama_34b_chat).
 - 🔥 2023.11.18: 支持**tongyi-finance-14b**系列模型: tongyi-finance-14b, tongyi-finance-14b-chat, tongyi-finance-14b-chat-int4. 对应的sh脚本可以查看[tongyi_finance_14b_chat_int4](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/tongyi_finance_14b_chat_int4).
 - 🔥 2023.11.16: 支持更多模型的**flash attn**支持: qwen系列, qwen-vl系列, llama系列, openbuddy系列, mistral系列, yi系列, ziya系列. 请使用`use_flash_attn`参数.
 - 🔥 2023.11.11: 支持**NEFTune**, 使用`Swift.prepare_model(model, NEFTuneConfig())`即可开启.
@@ -80,7 +81,7 @@ SWIFT（Scalable lightWeight Infrastructure for Fine-Tuning）是一个可扩展
   - xverse 系列: [xverse-7b](https://modelscope.cn/models/xverse/XVERSE-7B/summary), [xverse-7b-chat](https://modelscope.cn/models/xverse/XVERSE-7B-Chat/summary), [xverse-13b](https://modelscope.cn/models/xverse/XVERSE-13B/summary), [xverse-13b-chat](https://modelscope.cn/models/xverse/XVERSE-13B-Chat/summary), [xverse-65b](https://modelscope.cn/models/xverse/XVERSE-65B/summary)
   - bluelm 系列: [bluelm-7b](https://modelscope.cn/models/vivo-ai/BlueLM-7B-Base/summary), [bluelm-7b-chat](https://modelscope.cn/models/vivo-ai/BlueLM-7B-Chat/summary), [bluelm-7b-32k](https://modelscope.cn/models/vivo-ai/BlueLM-7B-Base-32K/summary), [bluelm-7b-chat-32k](https://modelscope.cn/models/vivo-ai/BlueLM-7B-Chat-32K/summary)
   - mistral 系列: [mistral-7b](https://modelscope.cn/models/AI-ModelScope/Mistral-7B-v0.1/summary), [mistral-7b-chat](https://modelscope.cn/models/AI-ModelScope/Mistral-7B-Instruct-v0.1/summary)
-  - yi 系列: [yi-6b](https://modelscope.cn/models/01ai/Yi-6B/summary), [yi-34b](https://modelscope.cn/models/01ai/Yi-34B/summary)
+  - yi 系列: [yi-6b](https://modelscope.cn/models/01ai/Yi-6B/summary), [yi-34b](https://modelscope.cn/models/01ai/Yi-34B/summary), [yi-34b-chat](https://modelscope.cn/models/01ai/Yi-34B-Chat/summary)
   - ziya 系列: [ziya2-13b](https://modelscope.cn/models/Fengshenbang/Ziya2-13B-Base/summary), [ziya2-13b-chat](https://modelscope.cn/models/Fengshenbang/Ziya2-13B-Chat/summary)
   - skywork 系列: [skywork-13b](https://modelscope.cn/models/skywork/Skywork-13B-base/summary), [skywork-13b-chat](https://modelscope.cn/models/skywork/Skywork-13B-chat/summary)
   - other: [polylm-13b](https://modelscope.cn/models/damo/nlp_polylm_13b_text_generation/summary), [seqgpt-560m](https://modelscope.cn/models/damo/nlp_seqgpt-560m/summary)
diff --git a/examples/pytorch/llm/README.md b/examples/pytorch/llm/README.md
@@ -29,7 +29,7 @@
   - xverse series: [xverse-7b](https://modelscope.cn/models/xverse/XVERSE-7B/summary), [xverse-7b-chat](https://modelscope.cn/models/xverse/XVERSE-7B-Chat/summary), [xverse-13b](https://modelscope.cn/models/xverse/XVERSE-13B/summary), [xverse-13b-chat](https://modelscope.cn/models/xverse/XVERSE-13B-Chat/summary), [xverse-65b](https://modelscope.cn/models/xverse/XVERSE-65B/summary)
   - bluelm series: [bluelm-7b](https://modelscope.cn/models/vivo-ai/BlueLM-7B-Base/summary), [bluelm-7b-chat](https://modelscope.cn/models/vivo-ai/BlueLM-7B-Chat/summary), [bluelm-7b-32k](https://modelscope.cn/models/vivo-ai/BlueLM-7B-Base-32K/summary), [bluelm-7b-chat-32k](https://modelscope.cn/models/vivo-ai/BlueLM-7B-Chat-32K/summary)
   - mistral series: [mistral-7b](https://modelscope.cn/models/AI-ModelScope/Mistral-7B-v0.1/summary), [mistral-7b-chat](https://modelscope.cn/models/AI-ModelScope/Mistral-7B-Instruct-v0.1/summary)
-  - yi series: [yi-6b](https://modelscope.cn/models/01ai/Yi-6B/summary), [yi-34b](https://modelscope.cn/models/01ai/Yi-34B/summary)
+  - yi series: [yi-6b](https://modelscope.cn/models/01ai/Yi-6B/summary), [yi-34b](https://modelscope.cn/models/01ai/Yi-34B/summary), [yi-34b-chat](https://modelscope.cn/models/01ai/Yi-34B-Chat/summary)
   - ziya series: [ziya2-13b](https://modelscope.cn/models/Fengshenbang/Ziya2-13B-Base/summary), [ziya2-13b-chat](https://modelscope.cn/models/Fengshenbang/Ziya2-13B-Chat/summary)
   - skywork series: [skywork-13b](https://modelscope.cn/models/skywork/Skywork-13B-base/summary), [skywork-13b-chat](https://modelscope.cn/models/skywork/Skywork-13B-chat/summary)
   - other: [polylm-13b](https://modelscope.cn/models/damo/nlp_polylm_13b_text_generation/summary), [seqgpt-560m](https://modelscope.cn/models/damo/nlp_seqgpt-560m/summary)
diff --git a/examples/pytorch/llm/README_CN.md b/examples/pytorch/llm/README_CN.md
@@ -29,7 +29,7 @@
   - xverse 系列: [xverse-7b](https://modelscope.cn/models/xverse/XVERSE-7B/summary), [xverse-7b-chat](https://modelscope.cn/models/xverse/XVERSE-7B-Chat/summary), [xverse-13b](https://modelscope.cn/models/xverse/XVERSE-13B/summary), [xverse-13b-chat](https://modelscope.cn/models/xverse/XVERSE-13B-Chat/summary), [xverse-65b](https://modelscope.cn/models/xverse/XVERSE-65B/summary)
   - bluelm 系列: [bluelm-7b](https://modelscope.cn/models/vivo-ai/BlueLM-7B-Base/summary), [bluelm-7b-chat](https://modelscope.cn/models/vivo-ai/BlueLM-7B-Chat/summary), [bluelm-7b-32k](https://modelscope.cn/models/vivo-ai/BlueLM-7B-Base-32K/summary), [bluelm-7b-chat-32k](https://modelscope.cn/models/vivo-ai/BlueLM-7B-Chat-32K/summary)
   - mistral 系列: [mistral-7b](https://modelscope.cn/models/AI-ModelScope/Mistral-7B-v0.1/summary), [mistral-7b-chat](https://modelscope.cn/models/AI-ModelScope/Mistral-7B-Instruct-v0.1/summary)
-  - yi 系列: [yi-6b](https://modelscope.cn/models/01ai/Yi-6B/summary), [yi-34b](https://modelscope.cn/models/01ai/Yi-34B/summary)
+  - yi 系列: [yi-6b](https://modelscope.cn/models/01ai/Yi-6B/summary), [yi-34b](https://modelscope.cn/models/01ai/Yi-34B/summary), [yi-34b-chat](https://modelscope.cn/models/01ai/Yi-34B-Chat/summary)
   - ziya 系列: [ziya2-13b](https://modelscope.cn/models/Fengshenbang/Ziya2-13B-Base/summary), [ziya2-13b-chat](https://modelscope.cn/models/Fengshenbang/Ziya2-13B-Chat/summary)
   - skywork 系列: [skywork-13b](https://modelscope.cn/models/skywork/Skywork-13B-base/summary), [skywork-13b-chat](https://modelscope.cn/models/skywork/Skywork-13B-chat/summary)
   - other: [polylm-13b](https://modelscope.cn/models/damo/nlp_polylm_13b_text_generation/summary), [seqgpt-560m](https://modelscope.cn/models/damo/nlp_seqgpt-560m/summary)
diff --git a/examples/pytorch/llm/llm_infer.py b/examples/pytorch/llm/llm_infer.py
@@ -5,4 +5,3 @@
 
 if __name__ == '__main__':
     result = infer_main()
-    print(f'infer_main result: {result}')
diff --git a/examples/pytorch/llm/llm_sft.py b/examples/pytorch/llm/llm_sft.py
@@ -5,4 +5,3 @@
 
 if __name__ == '__main__':
     output = sft_main()
-    print(f'sft_main output: {output}')
diff --git a/examples/pytorch/llm/scripts/codefuse_codellama_34b/lora/infer.sh b/examples/pytorch/llm/scripts/codefuse_codellama_34b/lora/infer.sh
@@ -0,0 +1,15 @@
+# Experimental environment: V100, A10, 3090
+PYTHONPATH=../../.. \
+CUDA_VISIBLE_DEVICES=0 \
+python llm_infer.py \
+    --ckpt_dir "output/codefuse-codellama-34b-chat/vx_xxx/checkpoint-xxx" \
+    --load_args_from_ckpt_dir true \
+    --eval_human false \
+    --max_length 4096 \
+    --use_flash_attn true \
+    --max_new_tokens 2048 \
+    --temperature 0.3 \
+    --top_p 0.7 \
+    --repetition_penalty 1.05 \
+    --do_sample true \
+    --merge_lora_and_save false \
diff --git a/examples/pytorch/llm/scripts/codefuse_codellama_34b/lora/sft.sh b/examples/pytorch/llm/scripts/codefuse_codellama_34b/lora/sft.sh
@@ -0,0 +1,37 @@
+# Experimental environment: V100, A10, 3090
+# 18GB GPU memory
+PYTHONPATH=../../.. \
+CUDA_VISIBLE_DEVICES=0 \
+python llm_sft.py \
+    --model_type codefuse-codellama-34b-chat \
+    --sft_type lora \
+    --tuner_backend swift \
+    --template_type codefuse-codellama \
+    --dtype fp16 \
+    --output_dir output \
+    --custom_train_dataset_path xxx.jsonl \
+    --custom_val_dataset_path yyy.jsonl \
+    --train_dataset_sample -1 \
+    --num_train_epochs 1 \
+    --max_length 4096 \
+    --check_dataset_strategy warning \
+    --lora_rank 8 \
+    --lora_alpha 32 \
+    --lora_dropout_p 0.05 \
+    --lora_target_modules DEFAULT \
+    --gradient_checkpointing true \
+    --batch_size 1 \
+    --weight_decay 0.01 \
+    --learning_rate 1e-4 \
+    --gradient_accumulation_steps 16 \
+    --max_grad_norm 0.5 \
+    --warmup_ratio 0.03 \
+    --eval_steps 100 \
+    --save_steps 100 \
+    --save_total_limit 2 \
+    --logging_steps 10 \
+    --use_flash_attn true \
+    --push_to_hub false \
+    --hub_model_id codefuse-codellama-34b-chat-lora \
+    --hub_private_repo true \
+    --hub_token 'your-sdk-token' \
diff --git a/examples/pytorch/llm/scripts/tongyi_finance_14b_chat_int4/qlora/sft.sh b/examples/pytorch/llm/scripts/tongyi_finance_14b_chat_int4/qlora/sft.sh
@@ -3,8 +3,7 @@
 PYTHONPATH=../../.. \
 CUDA_VISIBLE_DEVICES=0 \
 python llm_sft.py \
-    --model_id_or_path TongyiFinance/Tongyi-Finance-14B-Chat-Int4 \
-    --model_revision master \
+    --model_type tongyi-finance-14b-chat-int4 \
     --sft_type lora \
     --tuner_backend swift \
     --template_type chatml \
diff --git a/examples/pytorch/llm/scripts/yi_34b/lora_ddp_ds/infer.sh b/examples/pytorch/llm/scripts/yi_34b/lora_ddp_ds/infer.sh
@@ -6,6 +6,7 @@ python llm_infer.py \
     --load_args_from_ckpt_dir true \
     --eval_human false \
     --max_length 2048 \
+    --use_flash_attn true \
     --max_new_tokens 2048 \
     --temperature 0.7 \
     --top_p 0.7 \
diff --git a/examples/pytorch/llm/scripts/yi_34b/lora_ddp_ds/sft.sh b/examples/pytorch/llm/scripts/yi_34b/lora_ddp_ds/sft.sh
@@ -35,6 +35,7 @@ torchrun \
     --save_steps 100 \
     --save_total_limit 2 \
     --logging_steps 10 \
+    --use_flash_attn true \
     --push_to_hub false \
     --hub_model_id yi-34b-lora \
     --hub_private_repo true \
diff --git a/examples/pytorch/llm/scripts/yi_34b_chat/lora_ddp_ds/infer.sh b/examples/pytorch/llm/scripts/yi_34b_chat/lora_ddp_ds/infer.sh
@@ -0,0 +1,15 @@
+# Experimental environment: A100
+PYTHONPATH=../../.. \
+CUDA_VISIBLE_DEVICES=0 \
+python llm_infer.py \
+    --ckpt_dir "output/yi-34b-chat/vx_xxx/checkpoint-xxx" \
+    --load_args_from_ckpt_dir true \
+    --eval_human false \
+    --max_length 2048 \
+    --use_flash_attn true \
+    --max_new_tokens 2048 \
+    --temperature 0.1 \
+    --top_p 0.7 \
+    --repetition_penalty 1.05 \
+    --do_sample true \
+    --merge_lora_and_save false \
diff --git a/examples/pytorch/llm/scripts/yi_34b_chat/lora_ddp_ds/sft.sh b/examples/pytorch/llm/scripts/yi_34b_chat/lora_ddp_ds/sft.sh
@@ -0,0 +1,41 @@
+# Experimental environment: 2 * A100
+# 2 * 72GB GPU memory
+nproc_per_node=2
+
+PYTHONPATH=../../.. \
+CUDA_VISIBLE_DEVICES=0,1 \
+torchrun \
+    --nproc_per_node=$nproc_per_node \
+    --master_port 29500 \
+    llm_sft.py \
+    --model_type yi-34b-chat \
+    --sft_type lora \
+    --tuner_backend swift \
+    --template_type AUTO \
+    --dtype AUTO \
+    --output_dir output \
+    --dataset blossom-math-zh \
+    --train_dataset_sample -1 \
+    --num_train_epochs 1 \
+    --max_length 2048 \
+    --check_dataset_strategy warning \
+    --lora_rank 8 \
+    --lora_alpha 32 \
+    --lora_dropout_p 0.05 \
+    --lora_target_modules DEFAULT \
+    --gradient_checkpointing true \
+    --batch_size 1 \
+    --weight_decay 0.01 \
+    --learning_rate 1e-4 \
+    --gradient_accumulation_steps $(expr 16 / $nproc_per_node) \
+    --max_grad_norm 0.5 \
+    --warmup_ratio 0.03 \
+    --eval_steps 100 \
+    --save_steps 100 \
+    --save_total_limit 2 \
+    --logging_steps 10 \
+    --use_flash_attn true \
+    --push_to_hub false \
+    --hub_model_id yi-34b-chat-lora \
+    --hub_private_repo true \
+    --hub_token 'your-sdk-token' \
diff --git a/examples/pytorch/llm/scripts/yi_34b_chat/qlora/infer.sh b/examples/pytorch/llm/scripts/yi_34b_chat/qlora/infer.sh
@@ -0,0 +1,15 @@
+# Experimental environment: A10, 3090
+PYTHONPATH=../../.. \
+CUDA_VISIBLE_DEVICES=0 \
+python llm_infer.py \
+    --ckpt_dir "output/yi-34b-chat/vx_xxx/checkpoint-xxx" \
+    --load_args_from_ckpt_dir true \
+    --eval_human false \
+    --max_length 2048 \
+    --use_flash_attn false \
+    --max_new_tokens 2048 \
+    --temperature 0.1 \
+    --top_p 0.7 \
+    --repetition_penalty 1.05 \
+    --do_sample true \
+    --merge_lora_and_save false \
diff --git a/examples/pytorch/llm/scripts/yi_34b_chat/qlora/sft.sh b/examples/pytorch/llm/scripts/yi_34b_chat/qlora/sft.sh
@@ -0,0 +1,38 @@
+# Experimental environment: A10, 3090
+# 21GB GPU memory
+PYTHONPATH=../../.. \
+CUDA_VISIBLE_DEVICES=0 \
+python llm_sft.py \
+    --model_type yi-34b-chat \
+    --sft_type lora \
+    --tuner_backend swift \
+    --template_type AUTO \
+    --dtype AUTO \
+    --output_dir output \
+    --dataset blossom-math-zh \
+    --train_dataset_sample -1 \
+    --num_train_epochs 1 \
+    --max_length 2048 \
+    --check_dataset_strategy warning \
+    --quantization_bit 4 \
+    --bnb_4bit_comp_dtype AUTO \
+    --lora_rank 8 \
+    --lora_alpha 32 \
+    --lora_dropout_p 0.05 \
+    --lora_target_modules DEFAULT \
+    --gradient_checkpointing true \
+    --batch_size 1 \
+    --weight_decay 0.01 \
+    --learning_rate 1e-4 \
+    --gradient_accumulation_steps 16 \
+    --max_grad_norm 0.5 \
+    --warmup_ratio 0.03 \
+    --eval_steps 100 \
+    --save_steps 100 \
+    --save_total_limit 2 \
+    --logging_steps 10 \
+    --use_flash_attn false \
+    --push_to_hub false \
+    --hub_model_id yi-34b-chat-qlora \
+    --hub_private_repo true \
+    --hub_token 'your-sdk-token' \
diff --git a/swift/llm/sft.py b/swift/llm/sft.py
@@ -197,6 +197,7 @@ def llm_sft(args: SftArguments) -> str:
         save_strategy=IntervalStrategy.STEPS,
         save_steps=args.save_steps,
         save_total_limit=args.save_total_limit,
+        remove_unused_columns=False,
         bf16=args.bf16,
         fp16=args.fp16,
         eval_steps=args.eval_steps,
diff --git a/swift/llm/utils/argument.py b/swift/llm/utils/argument.py
@@ -309,7 +309,6 @@ class InferArguments:
     def __post_init__(self) -> None:
         handle_compatibility(self)
         handle_path(self)
-        set_model_type(self)
         logger.info(f'ckpt_dir: {self.ckpt_dir}')
         if self.ckpt_dir is None and self.load_args_from_ckpt_dir:
             self.load_args_from_ckpt_dir = False
@@ -318,6 +317,8 @@ def __post_init__(self) -> None:
             )
         if self.load_args_from_ckpt_dir:
             load_from_ckpt_dir(self)
+        else:
+            set_model_type(self)
         register_custom_dataset(self)
         check_flash_attn(self)
 
@@ -455,9 +456,9 @@ def set_model_type(args: Union[SftArguments, InferArguments]) -> None:
         args.model_type = model_mapping_reversed[model_id_or_path_lower]
 
     if args.model_type is None:
-        raise ValueError(f'args.model_type: {args.model_type}, '
-                         f'args.model_id_or_path: {args.model_id_or_path}')
-    if args.model_type not in MODEL_MAPPING:
+        raise ValueError(
+            'please setting `--model_type xxx` or `--model_id_or_path xxx`')
+    elif args.model_type not in MODEL_MAPPING:
         raise ValueError(f'model_type: {args.model_type} is not registered.')
     model_info = MODEL_MAPPING[args.model_type]
     if args.model_revision is None:
diff --git a/swift/llm/utils/dataset.py b/swift/llm/utils/dataset.py
@@ -798,6 +798,10 @@ def load_dataset_from_local(
             df = pd.read_csv(dataset_path)
         elif dataset_path.endswith('.jsonl'):
             df = transform_jsonl_to_df(read_from_jsonl(dataset_path))
+        elif dataset_path.endswith('.json'):
+            with open(dataset_path, 'r') as f:
+                obj_list = json.load(f)
+            df = transform_jsonl_to_df(obj_list)
         else:
             raise ValueError(
                 'The custom dataset only supports CSV format or JSONL format. You can refer to the link '
diff --git a/swift/llm/utils/model.py b/swift/llm/utils/model.py
diff --git a/swift/llm/utils/template.py b/swift/llm/utils/template.py
diff --git a/tests/llm/test_template.py b/tests/llm/test_template.py

Original file line number	Diff line number	Diff line change
`@@ -5,4 +5,3 @@`
`5`	`5`
`6`	`6`	`if __name__ == '__main__':`
`7`	`7`	`result = infer_main()`
`8`		`- print(f'infer_main result: {result}')`
Original file line number	Diff line number	Diff line change
`@@ -5,4 +5,3 @@`
`5`	`5`
`6`	`6`	`if __name__ == '__main__':`
`7`	`7`	`output = sft_main()`
`8`		`- print(f'sft_main output: {output}')`