kvcache-ai · KMSorSMS · Dec 24, 2025 · Dec 22, 2025 · Dec 23, 2025 · Dec 23, 2025
diff --git a/README.md b/README.md
@@ -16,7 +16,7 @@
 KTransformers is a research project focused on efficient inference and fine-tuning of large language models through CPU-GPU heterogeneous computing. The project has evolved into **two core modules**: [kt-kernel](./kt-kernel/) and [kt-sft](./kt-sft/).
 
 ## 🔥 Updates
-
+* **Dec 22, 2025**: Support RL-DPO fine-tuning with LLaMA-Factory. ([Tutorial](./doc/en/DPO_tutorial.md))
 * **Dec 5, 2025**: Support Native Kimi-K2-Thinking inference ([Tutorial](./doc/en/Kimi-K2-Thinking-Native.md))
 * **Nov 6, 2025**: Support Kimi-K2-Thinking inference ([Tutorial](./doc/en/Kimi-K2-Thinking.md)) and fine-tune ([Tutorial](./doc/en/SFT_Installation_Guide_KimiK2.md))
 * **Nov 4, 2025**: KTransformers Fine-Tuning × LLaMA-Factory Integration. ([Tutorial](./doc/en/KTransformers-Fine-Tuning_User-Guide.md))

diff --git a/doc/en/DPO_tutorial.md b/doc/en/DPO_tutorial.md
@@ -61,7 +61,7 @@ pip install custom_flashinfer/
 
 ## Prepare Models
 
-We uses `DeepSeek-V2-Lite-Chat` as an example here. You can replace it with other models such as Kimi K2.
+We uses `deepseek-ai/DeepSeek-V2-Lite` as an example here. You can replace it with other models such as Kimi K2.
 
 ## How to start
 
@@ -80,7 +80,7 @@ For example, we provide the YAML file as follows:
 
 ```YAML
 ### model
-model_name_or_path: DeepSeek-V2-Lite-Chat
+model_name_or_path: deepseek-ai/DeepSeek-V2-Lite
 trust_remote_code: true
 
 ### method
@@ -114,7 +114,7 @@ report_to: none  # choices: [none, wandb, tensorboard, swanlab, mlflow]
 per_device_train_batch_size: 1
 gradient_accumulation_steps: 8
 learning_rate: 5.0e-6
-num_train_epochs: 0.1
+num_train_epochs: 3
 lr_scheduler_type: cosine
 warmup_ratio: 0.1
 bf16: true
@@ -130,7 +130,7 @@ chunk_size: 8192
 
 For more details about --kt_optimize_rule, please refer to https://github.com/kvcache-ai/ktransformers/blob/main/doc/en/KTransformers-Fine-Tuning_User-Guide.md 
 
-（2）examples/inference/deepseek2_lora_dpo_kt.yaml
+Then, you can use the lora adapter saved in `saves/Kllama_deepseekV2_DPO` for inference the same as the sft training. For example,
 
 ```YAML
 model_name_or_path: DeepSeek-V2-Lite-Chat 

diff --git a/kt-sft/README.md b/kt-sft/README.md
@@ -191,6 +191,8 @@ cpu_infer: 32
 chunk_size: 8192
 ```
 
+We also support RL DPO training using the KTransformers backend now. See [DPO Tutorial](../doc/en/DPO_tutorial.md) for details.  
+
 `kt_optimize_rule` controls **placement strategy**. See also [ktransformers/optimize_rules](https://github.com/kvcache-ai/ktransformers/tree/main/ktransformers/optimize/optimize_rules). Naming hints (`*` = wildcard):
 
 | Pattern                                      | Meaning                                               |