Skip to content

aaditya29/GPT-2-124M-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 

Repository files navigation

GPT-2 (124M) Reproduction and Insights

Introduction

Reproducing the GPT-2 (124M) Transformer model from scratch using PyTorch. This project is built for learning, experimentation, and extending transformer models. We are taking reference for learning from Andrej Karpathy's Zero-To-Hero Series.

Highlights

  • Full Transformer architecture (multi-head attention, layer norm, residuals, position embeddings).
  • BPE tokenizer implementation.
  • Training pipeline with configs, logging, and checkpointing.
  • Experiments on small datasets (Shakespeare, WikiText) and scaling toward OpenWebText.

References

About

Reproducing GPT-2 (124M) from Scratch in PyTorch

Topics

Resources

Stars

Watchers

Forks

Languages