-
Notifications
You must be signed in to change notification settings - Fork 8
Open
Description
System Info
transformersversion: 4.27.0.dev0- Platform: Linux-5.15.0-1026-aws-x86_64-with-glibc2.29
- Python version: 3.8.10
- Huggingface_hub version: 0.12.0
- PyTorch version (GPU?): 1.13.1+cu117 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("bigcode/santacoder-fast-inference")
Expected behavior
Model should load without issue, but instead we get a bunch of errors of the form
RuntimeError: Error(s) in loading state_dict for GPTBigCodeLMHeadModel:
size mismatch for transformer.h.0.attn.c_attn.weight: copying a param with shape torch.Size([2048, 2304]) from checkpoint, the shape in current model is torch.Size([2304, 2048]).
size mismatch for transformer.h.0.mlp.c_fc.weight: copying a param with shape torch.Size([2048, 8192]) from checkpoint, the shape in current model is torch.Size([8192, 2048]).
size mismatch for transformer.h.0.mlp.c_proj.weight: copying a param with shape torch.Size([8192, 2048]) from checkpoint, the shape in current model is torch.Size([2048, 8192]).```
Metadata
Metadata
Assignees
Labels
No labels