GPT From Scratch

This repository contains a from-scratch implementation of a GPT-style language model, built end-to-end to understand how modern language models work at a fundamental level.

Everything is implemented manually — from tokenization to the Transformer architecture, training loop, and autoregressive text generation — with minimal reliance on high-level abstractions.

The goal of this project is learning and understanding, not performance or scale.

Motivation

Large language models are often treated as black boxes.
This project exists to answer deeper questions:

How does a tokenizer shape learning?
How does attention actually work, step by step?
What does it mean for a model to learn patterns in text?
How far can intelligence be built from first principles?
This project serves as a foundational step in my broader goal: Exploring the foundations of the human brain and emotion to solve AGI. By rebuilding these systems from the ground up, I aim to demystify the "black box" of modern AI.

This repository is part of a broader personal effort to explore AGI, human cognition, emotion, and biological learning through implementation and experimentation.

What Is Implemented From Scratch

Tokenizer
- Custom tokenizer (regex / BPE-style)
- Vocabulary construction
- Encoding and decoding logic
Model Architecture
- Token and positional embeddings
- Multi-head self-attention
- Causal masking
- Feed-forward networks
- Residual connections and layer normalization
- Full Transformer decoder stack
Training
- Custom training loop
- Loss computation
- Optimizer configuration
- Device handling (CPU / CUDA)
- Progress logging
Text Generation
- Autoregressive sampling
- Temperature and top-k sampling
- Context window handling

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
__pycache__		__pycache__
data		data
my_tokenizer		my_tokenizer
LICENSE		LICENSE
README.md		README.md
model.py		model.py
test.ipynb		test.ipynb
train.ipynb		train.ipynb
train_predict_simple_way.ipynb		train_predict_simple_way.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GPT From Scratch

Motivation

What Is Implemented From Scratch

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GPT From Scratch

Motivation

What Is Implemented From Scratch

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages