Skip to content

Commit 44d4053

Browse files
authored
[HotFix] update load lora model Readme; (#6240)
* [fix] update load lora model Readme; * [fix] update lora infer readme * [fix] remove useless comments
1 parent 6d676ee commit 44d4053

File tree

1 file changed

+57
-0
lines changed

1 file changed

+57
-0
lines changed

applications/ColossalChat/examples/README.md

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -892,6 +892,63 @@ The dialogues can by multiple turns and it can contain system prompt. For more d
892892

893893
We use bf16 weights for finetuning. If you downloaded fp8 DeepSeek V3/R1 weights, you can use the [script](https://github.com/deepseek-ai/DeepSeek-V3/blob/main/inference/fp8_cast_bf16.py) to convert the weights to bf16 via GPU. For Ascend NPU, you can use this [script](https://gitee.com/ascend/ModelZoo-PyTorch/blob/master/MindIE/LLM/DeepSeek/DeepSeek-V2/NPU_inference/fp8_cast_bf16.py).
894894

895+
We have also added details on how to load and reason with lora models.
896+
```python
897+
from transformers import (
898+
AutoModelForCausalLM,
899+
AutoTokenizer,
900+
)
901+
from peft import (
902+
PeftModel
903+
)
904+
import torch
905+
906+
# Set model path
907+
model_name = "Qwen/Qwen2.5-3B"
908+
lora_adapter = "Qwen2.5-3B_lora" # Your lora model Path
909+
merged_model_path = "Qwen2.5-3B_merged"
910+
911+
######
912+
# How to Load lora Model
913+
######
914+
# 1.Load base model
915+
base_model = AutoModelForCausalLM.from_pretrained(
916+
model_name,
917+
torch_dtype=torch.bfloat16,
918+
device_map="auto",
919+
trust_remote_code=True
920+
)
921+
922+
# 2.Load lora model
923+
peft_model = PeftModel.from_pretrained(
924+
base_model,
925+
lora_adapter,
926+
torch_dtype=torch.bfloat16
927+
)
928+
929+
# 3.Merge lora model
930+
merged_model = peft_model.merge_and_unload()
931+
932+
# 4.Load tokenizer
933+
tokenizer = AutoTokenizer.from_pretrained(
934+
model_name,
935+
trust_remote_code=True,
936+
pad_token="<|endoftext|>"
937+
)
938+
939+
# 5.Save merged lora model
940+
merged_model.save_pretrained(
941+
merged_model_path,
942+
safe_serialization=True
943+
)
944+
tokenizer.save_pretrained(merged_model_path)
945+
946+
# 6.Run Inference
947+
test_input = tokenizer("Instruction: Finding prime numbers up to 100\nAnswer:", return_tensors="pt").to("cuda")
948+
output = merged_model.generate(**test_input, max_new_tokens=100)
949+
print(tokenizer.decode(output[0], skip_special_tokens=True))
950+
```
951+
895952
#### Usage
896953

897954
After preparing the dataset and model weights, you can run the script with the following command:

0 commit comments

Comments
 (0)