Skip to content

bomunteanu/transformer-from-scratch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Transformer from scratch

A from-scratch implementation of the original Transformer architecture described in Attention Is All You Need (Vaswani et al., 2017), built in PyTorch without relying on high-level abstractions.

Motivation

The goal of this project is to gain a thorough understand the Transformer architecture by studying and replicating each of its building blocks, then assembling them into a working model and training it on real-world data (en-fr translation).

Repository Structure

transformer-from-scratch.ipynb contains the block-wise implementation of transformers, alongside the personal notes I took to help me break down and digest the Transformer Architecture.

transformer contains a cleaned-up, streamlined version of the architecture found in the Jupyter Notebook, which was more verbose for clarity purposes.

transformer/
├── model/
│   ├── attention.py       # MultiHeadAttention (self, masked, cross)
│   ├── encoder.py         # EncoderLayer, Encoder
│   ├── decoder.py         # DecoderLayer, Decoder
│   ├── feedforward.py     # Position-wise FeedForwardBlock
│   ├── embedding.py       # PositionalEncoding
│   └── transformer.py     # Seq2Seq (composes encoder + decoder)
├── data/
│   └── tokenizer.py       # Word-level tokenizer with regex splitting
├── train.py               # Training loop
├── main.py                # Entry point
└── config.py              # Hyperparameters and device configuration

Hyperparameters

Parameter Value
d_model 512
d_k / d_v 64
Heads 8
Layers 4
d_ff 2048
Optimizer Adam
Learning rate 0.0001

Training

Trained on the opus_books English-French dataset. The tokenizer splits on words and punctuation separately using regex, building a shared vocabulary from both source and target languages.

Requirements

torch
numpy
datasets

Install with:

pip install -r requirements.txt

Usage

python main.py

About

A from-scratch implementation of the original Transformer architecture described in Attention Is All You Need (Vaswani et al., 2017), built in PyTorch without relying on high-level abstractions, trained for English to French translation.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors