-
Notifications
You must be signed in to change notification settings - Fork 33
Open
Description
By saving the model and reloading it I managed to get the model working, both with quantized and full precision (it still uses 10gb max of gpu ram).
However, the model generates random characters. Here's the output:
GemmaForCausalLM(
(model): GemmaModel(
(embed_tokens): Embedding(256000, 2048, padding_idx=0)
(layers): ModuleList(
(0-17): 18 x GemmaDecoderLayer(
(self_attn): GemmaInfiniAttention(
(q_proj): Linear(in_features=2048, out_features=2048, bias=False)
(k_proj): Linear(in_features=2048, out_features=256, bias=False)
(v_proj): Linear(in_features=2048, out_features=256, bias=False)
(o_proj): Linear(in_features=2048, out_features=2048, bias=False)
(rotary_emb): GemmaRotaryEmbedding()
)
(mlp): GemmaMLP(
(gate_proj): Linear(in_features=2048, out_features=16384, bias=False)
(up_proj): Linear(in_features=2048, out_features=16384, bias=False)
(down_proj): Linear(in_features=16384, out_features=2048, bias=False)
(act_fn): PytorchGELUTanh()
)
(input_layernorm): GemmaRMSNorm()
(post_attention_layernorm): GemmaRMSNorm()
)
)
(norm): GemmaRMSNorm()
)
(lm_head): Linear(in_features=2048, out_features=256000, bias=False)
)
Input:
This work introduces an efficient method to scale Transformer-based
generated_sequence:
<bos>fetchone’mittlungPublicado:, (хьтан cobr " about mattino
relenting ? Alamofire theyallclose"conio
Generated:
<bos>fetchonekjø professeurs { AssemblyCulture for disagre ‘ Compañ ‘…GraphicsUnit youXmlEnum atpaddingVertical such. nakalista .enumi,stdarg It Caleb including autunno ifwithIdentifierഛ“ Muitos for якостіبسم relenting
When the model is printed it correctly says "GemmaInfiniAttention" for the self_attn layers, but it still generates random characters. What am I doing wrong?
Metadata
Metadata
Assignees
Labels
No labels