🧠 Small Language Model (SLM)

A modular, research-friendly Transformer architecture for building and experimenting with small-scale language models — designed for learning, exploration, and open research.

💡 This project lets you define your own transformer architectures (variable layers, heads, activations, dimensions) and train them on your own datasets. We have implemented this with TinyStories.

🚀 Features

✅ Fully modular Transformer architecture

Dynamic number of attention heads per layer
Configurable activation functions (ReLU, GELU, SiLU)
Pre-LayerNorm + Residual design for stability

✅ Flexible dataset pipeline

Uses Hugging Face Datasets
Tokenizes and chunks datasets into fixed-length training blocks
Saves & loads efficiently with Arrow format

✅ Research-grade training pipeline

Config-driven hyperparameters
AdamW optimizer + gradient clipping
Warmup + cosine LR scheduler
Optional mixed precision (AMP)
Hugging Face model saving/loading compatibility

✅ Multi-GPU (Distributed) training support

Out-of-the-box DistributedDataParallel (DDP)
Sharded sampling for each process
Rank-aware logging & checkpointing

✅ Open-source ready

Easy to modify, extend, and push to the Hugging Face Hub
Educational and clean codebase

🧩 Project Structure

SLM_Skeleton/
├── config_slm.py               # Centralized configuration (model, data, training)
├── data_module.py              # Dataset loading and tokenization
├── embedding_module.py           # Token + positional embeddings
├── multihead_self_attention.py   # Scaled dot-product attention (supports variable heads)
├── transformer_block.py          # Transformer block (Pre-LN + MHA + FFN)
├── train_slm.py                  # Full training script with scheduler
├── prepare_and_save_tokenized_dataset.py  # Preprocess TinyStories
└── trained_slm/                  # Output directory for saved model

⚙️ Installation

Clone the repository:

git clone https://github.com/aditya20t/SLM_Skeleton.git
cd SLM_Skeleton

Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate

Install dependencies:
```
pip install -r requirements.txt
```

📦 Dataset Preparation

By default, SLM uses the TinyStories dataset. Run the preprocessing script:

python data_module.py

🧠 Model Architecture Overview

SLM follows a GPT-style decoder-only transformer with a modular configuration based on the Pre-LayerNorm design for better training stability.

Architecture Flow:

Embedding → [ (LayerNorm → MHA → Residual) + (LayerNorm → FFN → Residual) ] × N → LayerNorm → LM Head

Each layer in the stack can have its own:

number of attention heads
activation type
dropout rate

Example dynamic Configuration

# From config_slm.py
n_heads_per_layer = [4, 4, 8, 8, 16, 16]
activations = ["gelu", "gelu", "silu", "silu", "gelu", "gelu"]

🏋️‍♂️ Training

To train the model on a single GPU, simply run python train_slm.py. The training pipeline includes a configurable warmup and cosine decay learning rate scheduler, AdamW optimizer.

💾 Saving and Loading (Hugging Face format)

After training, your model is saved to ./trained_slm/ in a format compatible with the Hugging Face ecosystem. You can load it for inference using the standard transformers library:

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("./trained_slm")
tokenizer = AutoTokenizer.from_pretrained("./trained_slm")

text = "Once upon a time"
inputs = tokenizer(text, return_tensors="pt")
output = model.generate(**inputs, max_new_tokens=50)

print(tokenizer.decode(output[0]))

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 Small Language Model (SLM)

🚀 Features

🧩 Project Structure

⚙️ Installation

📦 Dataset Preparation

🧠 Model Architecture Overview

Example dynamic Configuration

🏋️‍♂️ Training

💾 Saving and Loading (Hugging Face format)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config_slm.py		config_slm.py
data_module.py		data_module.py
embedding_module.py		embedding_module.py
inference.py		inference.py
multihead_self_attention.py		multihead_self_attention.py
requirements.txt		requirements.txt
train_slm.py		train_slm.py
transformer_block.py		transformer_block.py

Folders and files

Latest commit

History

Repository files navigation

🧠 Small Language Model (SLM)

🚀 Features

🧩 Project Structure

⚙️ Installation

📦 Dataset Preparation

🧠 Model Architecture Overview

Example dynamic Configuration

🏋️‍♂️ Training

💾 Saving and Loading (Hugging Face format)

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages