-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Open
Labels
Description
Hi! I followed all steps in the falcon blog, I'm using my own dataset. I get this error when I try to run an inference:
python generate/adapter_v2.py --adapter_path workspace/out/adapter/falcon/lit_model_adapter_finetuned.pth --checkpoint_dir checkpoints/tiiuae/falcon-7b --quantize llm.int8 --prompt "What food do lamas eat?"
Loading model 'checkpoints/tiiuae/falcon-7b/lit_model.pth' with {'org': 'tiiuae', 'name': 'falcon-7b', 'block_size': 2048, 'vocab_size': 50254, 'padding_multiple': 512, 'padded_vocab_size': 65024, 'n_layer': 32, 'n_head': 71, 'n_embd': 4544, 'rotary_percentage': 1.0, 'parallel_residual': True, 'bias': False, 'n_query_groups': 1, 'shared_attention_norm': True, '_norm_class': 'LayerNorm', 'norm_eps': 1e-05, '_mlp_class': 'GptNeoxMLP', 'intermediate_size': 18176, 'adapter_prompt_length': 10, 'adapter_start_layer': 2}
bin /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda117.so
Time to instantiate model: 1.49 seconds.
Time to load the model weights: 27.58 seconds.
Traceback (most recent call last):
File "/workspace/lit-gpt/generate/adapter_v2.py", line 137, in <module>
CLI(main)
File "/usr/local/lib/python3.10/dist-packages/jsonargparse/_cli.py", line 85, in CLI
return _run_component(component, cfg_init)
File "/usr/local/lib/python3.10/dist-packages/jsonargparse/_cli.py", line 147, in _run_component
return component(**cfg)
File "/workspace/lit-gpt/generate/adapter_v2.py", line 106, in main
y = generate(
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/workspace/lit-gpt/generate/base.py", line 66, in generate
logits = model(x, max_seq_length, input_pos)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1505, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1514, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/lightning/fabric/wrappers.py", line 116, in forward
output = self._forward_module(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1505, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1514, in _call_impl
return forward_call(*args, **kwargs)
File "/workspace/lit-gpt/lit_gpt/adapter.py", line 108, in forward
x, self.kv_caches[i], self.adapter_kv_caches[i] = block(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1505, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1514, in _call_impl
return forward_call(*args, **kwargs)
File "/workspace/lit-gpt/lit_gpt/adapter.py", line 150, in forward
h, new_kv_cache, new_adapter_kv_cache = self.attn(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1505, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1514, in _call_impl
return forward_call(*args, **kwargs)
File "/workspace/lit-gpt/lit_gpt/adapter.py", line 192, in forward
qkv = self.attn(x)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1505, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1514, in _call_impl
return forward_call(*args, **kwargs)
File "/workspace/lit-gpt/lit_gpt/adapter_v2.py", line 35, in adapter_v2_new_forward
return self.adapter_scale * (torch.nn.functional.linear(input, self.weight, self.bias) + self.adapter_bias)
RuntimeError: expected mat1 and mat2 to have the same dtype, but got: c10::BFloat16 != signed char