GPT-2 (124M) Reproduction and Insights

Introduction

Reproducing the GPT-2 (124M) Transformer model from scratch using PyTorch. This project is built for learning, experimentation, and extending transformer models. We are taking reference for learning from Andrej Karpathy's Zero-To-Hero Series.

Highlights

Full Transformer architecture (multi-head attention, layer norm, residuals, position embeddings).
BPE tokenizer implementation.
Training pipeline with configs, logging, and checkpointing.
Experiments on small datasets (Shakespeare, WikiText) and scaling toward OpenWebText.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
Transformer		Transformer
.gitignore		.gitignore
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GPT-2 (124M) Reproduction and Insights

Introduction

Highlights

References

About

Uh oh!

Languages

aaditya29/GPT-2-124M-

Folders and files

Latest commit

History

Repository files navigation

GPT-2 (124M) Reproduction and Insights

Introduction

Highlights

References

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages