-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
出bug的具体模型
Qwen1___5-7B-Chat
出bug的具体模型教程
Qwen1___5-7B-Chat用lora部署時
教程负责人
Bug描述
想問一下加载 lora 权重推理時的config是拿train的config即
config = LoraConfig(
task_type=TaskType.CAUSAL_LM,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
inference_mode=True, # 训练模式
r=8, # Lora 秩
lora_alpha=32, # Lora alaph,具体作用参见 Lora 原理
lora_dropout=0.1 # Dropout 比例
)
還是
config = PeftConfig.from_pretrained(lora_path)
复现步骤
我使用上述兩個config + 所給的loro權重推理代碼去run,會報
(yolo_gpu) C:\Users\it\Desktop\iong\self-study\llm\Qwen1-5-7B-Chat>python lora_test.py
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:04<00:00, 1.10s/it]
Some parameters are on the meta device device because they were offloaded to the cpu.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token.As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
C:\Users\it\anaconda3\envs\yolo_gpu\Lib\site-packages\transformers\generation\utils.py:1850: UserWarning: You are calling .generate() with the `input_ids` being on a device type different than your model's device. `input_ids` is
on cuda, whereas the model is on cpu. You may experience unexpected behaviors or slower generation. Please make sure that you have put `input_ids` to the correct device by calling for example input_ids = input_ids.to('cpu') before running `.generate()`.
warnings.warn(
Traceback (most recent call last):
File "C:\Users\it\Desktop\iong\self-study\llm\Qwen1-5-7B-Chat\lora_test.py", line 36, in <module>
generated_ids = model.generate(
^^^^^^^^^^^^^^^
File "C:\Users\it\anaconda3\envs\yolo_gpu\Lib\site-packages\peft\peft_model.py", line 1491, in generate
outputs = self.base_model.generate(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\it\anaconda3\envs\yolo_gpu\Lib\site-packages\torch\utils\_contextlib.py", line 124, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\it\anaconda3\envs\yolo_gpu\Lib\site-packages\transformers\generation\utils.py", line 1989, in generate
result = self._sample(
^^^^^^^^^^^^^
File "C:\Users\it\anaconda3\envs\yolo_gpu\Lib\site-packages\transformers\generation\utils.py", line 2932, in _sample
outputs = self(**model_inputs, return_dict=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\it\anaconda3\envs\yolo_gpu\Lib\site-packages\torch\nn\modules\module.py", line 1776, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\it\anaconda3\envs\yolo_gpu\Lib\site-packages\torch\nn\modules\module.py", line 1787, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\it\anaconda3\envs\yolo_gpu\Lib\site-packages\accelerate\hooks.py", line 169, in new_forward
output = module._old_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\it\anaconda3\envs\yolo_gpu\Lib\site-packages\transformers\models\qwen2\modeling_qwen2.py", line 1054, in forward
outputs = self.model(
^^^^^^^^^^^
File "C:\Users\it\anaconda3\envs\yolo_gpu\Lib\site-packages\torch\nn\modules\module.py", line 1776, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\it\anaconda3\envs\yolo_gpu\Lib\site-packages\torch\nn\modules\module.py", line 1787, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\it\anaconda3\envs\yolo_gpu\Lib\site-packages\transformers\models\qwen2\modeling_qwen2.py", line 819, in forward
inputs_embeds = self.embed_tokens(input_ids)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\it\anaconda3\envs\yolo_gpu\Lib\site-packages\torch\nn\modules\module.py", line 1776, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\it\anaconda3\envs\yolo_gpu\Lib\site-packages\torch\nn\modules\module.py", line 1787, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\it\anaconda3\envs\yolo_gpu\Lib\site-packages\torch\nn\modules\sparse.py", line 191, in forward
return F.embedding(
^^^^^^^^^^^^
File "C:\Users\it\anaconda3\envs\yolo_gpu\Lib\site-packages\torch\nn\functional.py", line 2567, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Expected all tensors to be on the same device, but got index is on cuda:0, different from other tensors on cpu (when checking argument in method wrapper_CUDA__index_select)
是否代碼lora後的模型參數>我顯存的大小
期望行为
- 想知道哪一個config才是正確?
- 我跑不出代碼的原因是否「是否代碼lora後的模型參數>我顯存的大小」
环境信息
python: 3.11.14
GPU: RTX 4090 Laptop
其他信息
暫無
确认事项 / Verification
- 此问题未在过往Issue中被报告过 / This issue hasn't been reported before
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working