DrownFish19
diff --git a/‎README.md‎
Lines changed: 1 addition & 1 deletion b/‎README.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎llm/README.md‎
Lines changed: 55 additions & 4 deletions b/‎llm/README.md‎
Lines changed: 55 additions & 4 deletions
diff --git a/‎llm/alignment/dpo/run_dpo.py‎
Lines changed: 23 additions & 22 deletions b/‎llm/alignment/dpo/run_dpo.py‎
Lines changed: 23 additions & 22 deletions
diff --git a/‎llm/alignment/kto/kto_argument.py‎
Lines changed: 135 additions & 0 deletions b/‎llm/alignment/kto/kto_argument.py‎
Lines changed: 135 additions & 0 deletions
@@ -129,7 +129,7 @@
 * 大模型预训练、精调（包含 SFT、PEFT 技术）、对齐、量化已支持 LLaMA 系列、Baichuan 系列、Bloom 系列、ChatGLM 系列、Mistral 系列、OPT 系列和 Qwen 系列，【LLM】模型预训练、精调、对齐、量化支持列表如下：
 
 
-| Model                                      | Pretrain | SFT | LoRA | FlashMask | Prefix Tuning | DPO/SimPO/ORPO | RLHF | Mergekit | Quantization |
+| Model                                      | Pretrain | SFT | LoRA | FlashMask | Prefix Tuning | DPO/SimPO/ORPO/KTO | RLHF | Mergekit | Quantization |
 |--------------------------------------------|:--------:|:---:|:----:|:---------:|:-------------:|:--------------:|:----:|:-----:|:------------:|
 | [Llama](./llm/config/llama)                |    ✅     |  ✅  |  ✅   |     ✅     |       ✅       |       ✅        |  ✅   |   ✅   |      ✅       |
 | [Qwen](./llm/config/qwen)                  |    ✅     |  ✅  |  ✅   |     ✅     |       ✅       |       ✅        |  🚧  |   ✅   |      🚧      |
 
@@ -16,7 +16,7 @@
 
 ## 🛠️ 支持模型列表 🛠️
 
-| Model                                  | Pretrain | SFT | LoRA | Prefix Tuning | DPO/SimPO/ORPO | RLHF | Mergekit | Quantization | Torch convert |
+| Model                                  | Pretrain | SFT | LoRA | Prefix Tuning | DPO/SimPO/ORPO/KTO | RLHF | Mergekit | Quantization | Torch convert |
 |----------------------------------------|----------|-----|------|---------------|----------------|------|-------|--------------|---------------|
 | [LLaMA](./config/llama)                | ✅        | ✅   | ✅    | ✅             | ✅             | ✅    | ✅    | ✅            | ✅             |
 | [Qwen](./config/qwen)                  | ✅        | ✅   | ✅    | ✅             | ✅             | 🚧   | ✅    | 🚧           | ✅             |
@@ -154,7 +154,7 @@ python  run_finetune.py ./config/llama/pt_argument.json
 
 ### 3. 对齐
 
-我们支持 DPO、RLHF 等偏好对齐策略。DPO 策略采用 zero_padding 策略，结合 FlashMask 策略，有效提升模型训练效率。
+我们支持 DPO、KTO、RLHF 等偏好对齐策略。DPO、KTO 策略采用 zero_padding 策略，结合 FlashMask 策略，有效提升模型训练效率。
 
 #### 3.1 DPO
 
@@ -183,7 +183,7 @@ python  run_finetune.py ./config/llama/pt_argument.json
 ...
 ```
 
-为了方便测试，我们也提供了广告生成数据集可以直接使用：
+为了方便测试，我们也提供了偏好数据集可以直接使用：
 
 ```bash
 wget https://bj.bcebos.com/paddlenlp/datasets/examples/ultrafeedback_binarized.tar.gz
@@ -196,9 +196,60 @@ tar -zxvf ultrafeedback_binarized.tar.gz
 # DPO 启动命令参考
 python -u  -m paddle.distributed.launch --gpus "0,1,2,3,4,5,6,7" ./alignment/dpo/run_dpo.py ./config/llama/dpo_argument.json
 ```
+
+##### LoRA DPO
+
+```bash
+# DPO 启动命令参考
+python -u  -m paddle.distributed.launch --gpus "0,1,2,3,4,5,6,7" ./alignment/dpo/run_dpo.py ./config/llama/dpo_lora_argument.json
+```
 更多 DPO 技术细节和使用说明详见[DPO 文档](./docs/dpo.md)。
 
-#### 3.2 RLHF
+#### 3.2 KTO
+
+##### 数据准备
+
+我们支持的精调数据格式是每行包含一个字典的 json 文件，每个字典包含以下字段：
+
+- `src` : `str, List(str)`, 用户对话内容。
+- `tgt` : `str, List(str)`, 系统回复内容。
+- `response` : `str, List(str)`, 包含 resoinse 回复。
+- `sort` : `List(int)`, sort 值用于区分 response 属于 chosen 和 rejected（0是 rejected，1是 chosen）。
+
+样例数据：
+
+```text
+{
+    "src": ["In this task, you are given a second sentence. Your task is to generate the first sentence on the same topic but incoherent and inconsistent with the second sentence.\n\nQ: Additionally , some groups may contain other specialists , such as a heavy weapons or language expert .\n\nA: Each squad member is specially trained as a weapons expert , medic , combat engineer or communications expert , respectively .\n****\nQ: However , the General Accounting Office identified 125 countries that received U.S. training and assistance for their police forces during fiscal year 1990 at a cost of at least $117 million .\n\nA: No government agency is in charge of calculating the cost .\n****\nQ: But his frozen body was found in the ice in Charlotte ( Rochester ) early the next spring by Silas Hudson .\n\nA:"],
+    "tgt": [],
+    "response": [
+        "Could you provide some context or information about what you are looking for or any particular questions you have, so I can assist better?"],
+    "sort": [1]
+}
+...
+```
+
+为了方便测试，我们也提供了偏好数据集可以直接使用：
+
+```bash
+wget https://bj.bcebos.com/paddlenlp/datasets/examples/ultrafeedback_binarized_pointwise.tar.gz
+tar -zxvf ultrafeedback_binarized.tar.gz
+```
+
+##### 全参 KTO
+
+```bash
+# KTO 启动命令参考
+python -u  -m paddle.distributed.launch --gpus "0,1,2,3,4,5,6,7" ./alignment/kto/run_kto.py ./config/llama/kto_argument.json
+```
+##### LoRA KTO
+
+```bash
+# KTO 启动命令参考
+python -u  -m paddle.distributed.launch --gpus "0,1,2,3,4,5,6,7" ./alignment/kto/run_kto.py ./config/llama/kto_lora_argument.json
+```
+
+#### 3.3 RLHF
 
 飞桨大模型套件提供了提供了基于强化学习 PPO 算法对 LLM 进行人类偏好对齐的代码及完整使用示例，支持**3D 分布式并行训练以及 rollout 阶段使用预测优化进行生成加速**。详细使用教程详见[RLHF 文档](./docs/rlhf.md)。
 
 
@@ -45,6 +45,8 @@
     Qwen2ForCausalLMPipe,
     register_sequence_parallel_allreduce_hooks,
 )
+from paddlenlp.transformers.configuration_utils import LlmMetaConfig
+from paddlenlp.transformers.refined_recompute import update_refined_recompute
 from paddlenlp.trl import (
     DPOTrainer,
     calculate_effective_tokens,
@@ -80,14 +82,14 @@ def main():
             hasattr(training_args, "pipeline_parallel_config")
             and "enable_clear_every_step_cache" in training_args.pipeline_parallel_config
         ), "Should set '--pipeline_parallel_config enable_clear_every_step_cache' in bash script for pp."
-    if model_args.sequence_parallel:
+    if training_args.sequence_parallel:
         if training_args.pipeline_parallel_degree > 1:
             assert (
                 hasattr(training_args, "pipeline_parallel_config")
                 and "disable_partial_send_recv" in training_args.pipeline_parallel_config
             ), "Should set '--pipeline_parallel_config disable_partial_send_recv' in bash script for pp with sp."
         if training_args.tensor_parallel_degree <= 1:
-            model_args.sequence_parallel = False
+            training_args.sequence_parallel = False
             logger.info("Tensor_parallel_degree = 1. Set sequence_parallel to False.")
     training_args.print_config(model_args, "Model")
     training_args.print_config(data_args, "Data")
@@ -117,39 +119,38 @@ def main():
             dtype = "bfloat16"
 
     logger.info("Start to load model & tokenizer.")
-    model_kwargs = dict(
-        pretrained_model_name_or_path=model_args.model_name_or_path,
-        dtype=dtype,
-        tensor_parallel_degree=training_args.tensor_parallel_degree,
-        tensor_parallel_rank=training_args.tensor_parallel_rank,
-        recompute_granularity=training_args.recompute_granularity,
-        use_flash_attention=training_args.use_flash_attention,
-        tensor_parallel_output=training_args.tensor_parallel_output,
-        use_fused_rms_norm=training_args.use_fused_rms_norm,
-        use_fused_rope=training_args.use_fused_rope,
-        use_fused_linear=training_args.use_fused_linear,
-        use_fused_dropout_add=training_args.use_fused_dropout_add,
+
+    model_config = AutoConfig.from_pretrained(model_args.model_name_or_path, dtype=dtype)
+    LlmMetaConfig.set_llm_config(model_config, training_args)
+    model_config.refined_recompute = update_refined_recompute(
+        training_args.refined_recompute,
+        dpo_config.lora,
     )
+    if not dpo_config.reference_free and not dpo_config.lora:
+        ref_model_config = AutoConfig.from_pretrained(model_args.model_name_or_path, dtype=dtype)
+        LlmMetaConfig.set_llm_config(ref_model_config, training_args)
+        ref_model_config.refined_recompute = update_refined_recompute(
+            training_args.refined_recompute,
+            dpo_config.lora,
+        )
 
     if training_args.pipeline_parallel_degree > 1:
         model_class = AutoModelForCausalLMPipe
-        model_kwargs["dpo_config"] = dpo_config
+        model_config.dpo_config = dpo_config
     else:
         model_class = AutoModelForCausalLM
     if not training_args.autotuner_benchmark or model_args.weight_quantize_algo is not None:
-        model = model_class.from_pretrained(**model_kwargs)
+        model = model_class.from_pretrained(model_args.model_name_or_path, config=model_config)
         # for DPO save
         if not dpo_config.reference_free and not dpo_config.lora:
-            config = AutoConfig.from_pretrained(**model_kwargs)
-            ref_model = model_class.from_config(config, dtype=dtype)
+            ref_model = model_class.from_config(ref_model_config)
             ref_model.set_state_dict(model.state_dict())
         else:
             ref_model = None
     else:
-        config = AutoConfig.from_pretrained(**model_kwargs)
-        model = model_class.from_config(config, dtype=dtype)
+        model = model_class.from_config(model_config)
         if not dpo_config.reference_free and not dpo_config.lora:
-            ref_model = model_class.from_config(config, dtype=dtype)
+            ref_model = model_class.from_config(ref_model_config)
         else:
             ref_model = None
     if training_args.pipeline_parallel_degree > 1:
@@ -163,7 +164,7 @@ def main():
 
     if training_args.sequence_parallel:
         register_sequence_parallel_allreduce_hooks(
-            model, training_args.gradient_accumulation_steps, model_args.fuse_sequence_parallel_allreduce
+            model, training_args.gradient_accumulation_steps, training_args.fuse_sequence_parallel_allreduce
         )
     if model_args.tokenizer_name_or_path is not None:
         tokenizer = AutoTokenizer.from_pretrained(model_args.tokenizer_name_or_path)
 
@@ -0,0 +1,135 @@
+# Copyright (c) 2024 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+from dataclasses import dataclass, field
+from typing import Optional
+
+from paddlenlp.trainer import TrainingArguments
+from paddlenlp.trainer.trainer_utils import IntervalStrategy
+from paddlenlp.trainer.utils.doc import add_start_docstrings
+from paddlenlp.transformers.configuration_utils import llmmetaclass
+
+
+@dataclass
+@llmmetaclass
+@add_start_docstrings(TrainingArguments.__doc__)
+class KTOTrainingArguments(TrainingArguments):
+    """KTOTrainingArguments"""
+
+    unified_checkpoint: bool = field(
+        default=True,
+        metadata={"help": "Enable fused linear grad add strategy."},
+    )
+    unified_checkpoint_config: Optional[str] = field(
+        default="",
+        metadata={"help": "Configs to unify hybrid parallel checkpoint.\n"},
+    )
+    autotuner_benchmark: bool = field(
+        default=False,
+        metadata={"help": "Whether to run benchmark by autotuner. True for from_scratch."},
+    )
+    benchmark: bool = field(
+        default=False,
+        metadata={"help": "Whether to run benchmark by autotuner. True for from_scratch."},
+    )
+
+    def __post_init__(self):
+        super().__post_init__()
+        if self.autotuner_benchmark:
+            self.num_train_epochs = 1
+            self.max_steps = 5
+            self.do_train = True
+            self.do_export = False
+            self.do_predict = False
+            self.do_eval = False
+            self.overwrite_output_dir = True
+            self.load_best_model_at_end = False
+            self.report_to = []
+            self.save_strategy = IntervalStrategy.NO
+            self.evaluation_strategy = IntervalStrategy.NO
+            if not self.disable_tqdm:
+                self.logging_steps = 1
+                self.logging_strategy = IntervalStrategy.STEPS
+        if self.benchmark:
+            self.do_train = True
+            self.do_export = False
+            self.do_predict = False
+            self.do_eval = False
+            self.overwrite_output_dir = True
+            self.load_best_model_at_end = False
+            self.save_strategy = IntervalStrategy.NO
+            self.evaluation_strategy = IntervalStrategy.NO
+            if not self.disable_tqdm:
+                self.logging_steps = 1
+                self.logging_strategy = IntervalStrategy.STEPS
+        if self.max_steps > 0:
+            self.num_train_epochs = 1
+
+
+@dataclass
+class KTOConfig:
+    """KTOConfig"""
+
+    beta: float = field(default=0.1, metadata={"help": "the beta parameter for KTO loss"})
+    desirable_weight: float = field(default=1.0, metadata={"help": "desirable_weight"})
+    undesirable_weight: float = field(default=1.0, metadata={"help": "undesirable_weight"})
+    lora: bool = field(default=False, metadata={"help": "Use LoRA model."})
+
+
+@dataclass
+class KTODataArgument:
+    """DataArgument"""
+
+    train_dataset_path: str = field(default="./data/train.jsonl", metadata={"help": "Path to the train dataset dir."})
+    dev_dataset_path: str = field(default="./data/dev.jsonl", metadata={"help": "Path to the dev dataset dir."})
+    max_seq_len: int = field(default=4096, metadata={"help": "Maximum sequence length."})
+    max_prompt_len: int = field(default=2048, metadata={"help": "Maximum prompt length."})
+    greedy_zero_padding: bool = field(
+        default=False,
+        metadata={"help": "Whether to use Greedy Zero Padding data stream."},
+    )
+
+
+@dataclass
+class KTOModelArgument:
+    """ModelArgument"""
+
+    model_name_or_path: str = field(
+        default=None, metadata={"help": "Pretrained model name or path to local directory."}
+    )
+    tokenizer_name_or_path: Optional[str] = field(
+        default=None, metadata={"help": "Pretrained tokenizer name or path if not the same as model_name"}
+    )
+    flash_mask: bool = field(default=False, metadata={"help": "Whether to use flash mask in flash attention."})
+    weight_quantize_algo: str = field(
+        default=None,
+        metadata={"help": "Model weight quantization algorithm including 'nf4'(qlora), 'weight_only_int8'."},
+    )
+    fuse_attention_qkv: bool = field(
+        default=None,
+        metadata={"help": "whether to fuse attention qkv"},
+    )
+    fuse_attention_ffn: bool = field(
+        default=None,
+        metadata={"help": "whether to fuse first up and gate proj in mlp block"},
+    )
+    # LoRA
+    lora_rank: int = field(default=8, metadata={"help": "Lora rank."})
+    lora_path: str = field(default=None, metadata={"help": "Initialize lora state dict."})
+    rslora: bool = field(default=False, metadata={"help": "Whether to use RsLoRA"})
+    lora_plus_scale: float = field(default=1.0, metadata={"help": "Lora B scale in LoRA+ technique"})
+    lora_alpha: int = field(default=-1, metadata={"help": "lora_alpha"})
+    rslora_plus: bool = field(default=False, metadata={"help": "Strengthen lora performance"})
+    use_quick_lora: bool = field(default=True, metadata={"help": "quick lora"})