Skip to content

Qwen1___5-7B-Chat用LoRa微調huanhuan #480

@ghhhgfgh

Description

@ghhhgfgh

出bug的具体模型

Qwen1___5-7B-Chat

出bug的具体模型教程

Qwen1___5-7B-Chat用lora部署時

教程负责人

@Dormiveglia-elf

Bug描述

想問一下加载 lora 权重推理時的config是拿train的config即

config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
    inference_mode=True,  # 训练模式
    r=8,  # Lora 秩
    lora_alpha=32,  # Lora alaph,具体作用参见 Lora 原理
    lora_dropout=0.1  # Dropout 比例
)

還是
config = PeftConfig.from_pretrained(lora_path)

复现步骤

我使用上述兩個config + 所給的loro權重推理代碼去run,會報

(yolo_gpu) C:\Users\it\Desktop\iong\self-study\llm\Qwen1-5-7B-Chat>python lora_test.py
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:04<00:00,  1.10s/it]
Some parameters are on the meta device device because they were offloaded to the cpu.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token.As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
C:\Users\it\anaconda3\envs\yolo_gpu\Lib\site-packages\transformers\generation\utils.py:1850: UserWarning: You are calling .generate() with the `input_ids` being on a device type different than your model's device. `input_ids` is
 on cuda, whereas the model is on cpu. You may experience unexpected behaviors or slower generation. Please make sure that you have put `input_ids` to the correct device by calling for example input_ids = input_ids.to('cpu') before running `.generate()`.
  warnings.warn(
Traceback (most recent call last):
  File "C:\Users\it\Desktop\iong\self-study\llm\Qwen1-5-7B-Chat\lora_test.py", line 36, in <module>
    generated_ids = model.generate(
                    ^^^^^^^^^^^^^^^
  File "C:\Users\it\anaconda3\envs\yolo_gpu\Lib\site-packages\peft\peft_model.py", line 1491, in generate
    outputs = self.base_model.generate(*args, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\it\anaconda3\envs\yolo_gpu\Lib\site-packages\torch\utils\_contextlib.py", line 124, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\it\anaconda3\envs\yolo_gpu\Lib\site-packages\transformers\generation\utils.py", line 1989, in generate
    result = self._sample(
             ^^^^^^^^^^^^^
  File "C:\Users\it\anaconda3\envs\yolo_gpu\Lib\site-packages\transformers\generation\utils.py", line 2932, in _sample
    outputs = self(**model_inputs, return_dict=True)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\it\anaconda3\envs\yolo_gpu\Lib\site-packages\torch\nn\modules\module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\it\anaconda3\envs\yolo_gpu\Lib\site-packages\torch\nn\modules\module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\it\anaconda3\envs\yolo_gpu\Lib\site-packages\accelerate\hooks.py", line 169, in new_forward
    output = module._old_forward(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\it\anaconda3\envs\yolo_gpu\Lib\site-packages\transformers\models\qwen2\modeling_qwen2.py", line 1054, in forward
    outputs = self.model(
              ^^^^^^^^^^^
  File "C:\Users\it\anaconda3\envs\yolo_gpu\Lib\site-packages\torch\nn\modules\module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\it\anaconda3\envs\yolo_gpu\Lib\site-packages\torch\nn\modules\module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\it\anaconda3\envs\yolo_gpu\Lib\site-packages\transformers\models\qwen2\modeling_qwen2.py", line 819, in forward
    inputs_embeds = self.embed_tokens(input_ids)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\it\anaconda3\envs\yolo_gpu\Lib\site-packages\torch\nn\modules\module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\it\anaconda3\envs\yolo_gpu\Lib\site-packages\torch\nn\modules\module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\it\anaconda3\envs\yolo_gpu\Lib\site-packages\torch\nn\modules\sparse.py", line 191, in forward
    return F.embedding(
           ^^^^^^^^^^^^
  File "C:\Users\it\anaconda3\envs\yolo_gpu\Lib\site-packages\torch\nn\functional.py", line 2567, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Expected all tensors to be on the same device, but got index is on cuda:0, different from other tensors on cpu (when checking argument in method wrapper_CUDA__index_select)

是否代碼lora後的模型參數>我顯存的大小

期望行为

  1. 想知道哪一個config才是正確?
  2. 我跑不出代碼的原因是否「是否代碼lora後的模型參數>我顯存的大小」

环境信息

python: 3.11.14
GPU: RTX 4090 Laptop

其他信息

暫無

确认事项 / Verification

  • 此问题未在过往Issue中被报告过 / This issue hasn't been reported before

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions