Skip to content

mwzkhalil/tinygemma-Urdu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 

Repository files navigation

tinyGemma Urdu

Trained a 0.96 million parameters Urdu Gemma.

Architecture

A version of Google's Gemma architecture with the following components as defined in GemmaConfig:

  • GemmaAttention: Multi-head attention with grouped query attention (num_queries_per_kv), RoPE positional embeddings via apply_rotary_emb(), and causal masking using pre-computed triangular mask
  • GemmaMLP: Feed-forward network with GELU activation implementing gate_proj * up_proj gating mechanism through down_proj
  • GemmaDecoderLayer: Transformer block combining self_attn and mlp with pre-normalization using RMSNorm
  • RMSNorm: Root Mean Square Layer Normalization with optional unit offset (add_unit_offset=True) and learnable weight parameter
  • tinyGemma: Complete model with embedder scaled by sqrt(hidden_size) and tied weights for language modeling head

Training Results

Achieved convergence on Urdu corpus with the following performance metrics:

Final Training Metrics (5000 iterations):
- Training Loss: 2.7668
- Validation Loss: 2.9250  
- Validation Perplexity: 18.6348
- Learning Rate: 3e-4 with AdamW optimizer
- Batch Size: 16 with 2 gradient accumulation steps

Model Weights

tinygemma_Urdu

Loss Curves

Train and Val loss curves

License

MIT License

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages