Model generating random sequence

By saving the model and reloading it I managed to get the model working, both with quantized and full precision (it still uses 10gb max of gpu ram).
However, the model generates random characters. Here's the output:
```

GemmaForCausalLM(
  (model): GemmaModel(
    (embed_tokens): Embedding(256000, 2048, padding_idx=0)
    (layers): ModuleList(
      (0-17): 18 x GemmaDecoderLayer(
        (self_attn): GemmaInfiniAttention(
          (q_proj): Linear(in_features=2048, out_features=2048, bias=False)
          (k_proj): Linear(in_features=2048, out_features=256, bias=False)
          (v_proj): Linear(in_features=2048, out_features=256, bias=False)
          (o_proj): Linear(in_features=2048, out_features=2048, bias=False)
          (rotary_emb): GemmaRotaryEmbedding()
        )
        (mlp): GemmaMLP(
          (gate_proj): Linear(in_features=2048, out_features=16384, bias=False)
          (up_proj): Linear(in_features=2048, out_features=16384, bias=False)
          (down_proj): Linear(in_features=16384, out_features=2048, bias=False)
          (act_fn): PytorchGELUTanh()
        )
        (input_layernorm): GemmaRMSNorm()
        (post_attention_layernorm): GemmaRMSNorm()
      )
    )
    (norm): GemmaRMSNorm()
  )
  (lm_head): Linear(in_features=2048, out_features=256000, bias=False)
)
Input:
This work introduces an efficient method to scale Transformer-based
generated_sequence:
<bos>fetchone’mittlungPublicado:, (хьтан cobr " about mattino
relenting ? Alamofire theyallclose"conio
Generated:
<bos>fetchonekjø  professeurs { AssemblyCulture for disagre ‘ Compañ ‘…GraphicsUnit youXmlEnum atpaddingVertical such. nakalista .enumi,stdarg It Caleb including autunno ifwithIdentifierഛ“ Muitos for якостіبسم  relenting
```
When the model is printed it correctly says "GemmaInfiniAttention" for the self_attn layers, but it still generates random characters. What am I doing wrong?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model generating random sequence #11

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Model generating random sequence #11

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions