Transformer Model Implementation in PyTorch

In this repository, I dive into the "Attention is All You Need" paper and work to understand Transformers in depth by implementing the main components from scratch using PyTorch.

Objective

The main goal is to break down and implement the core ideas of the Transformer model, including:

Self-Attention and Multi-Head Attention
Positional Encoding for sequence order information
Feed-Forward Layers and Layer Normalization
Stacked Encoder Layers as seen in the original architecture

Components

Input Embeddings: Converts token IDs to embeddings.
Positional Encoding: Adds positional context to embeddings.
Self-Attention Mechanism: Computes the relationships between tokens.
Multi-Head Attention: Uses multiple attention heads for richer representations.
Feed-Forward Network: Processes the outputs from the attention layers.
Encoder Layer: Combines attention, feed-forward, and normalization layers.
Encoder: Stacks multiple encoder layers to build the final model.

Getting Started

Prerequisites

Python 3.x
PyTorch
(Optional) CUDA for faster training with GPUs

Installation

Clone the repository:

git clone https://github.com/yourusername/transformer-implementation.git

Why Transformers?

Transformers are powerful because they handle dependencies across entire sequences using attention mechanisms, allowing the model to focus on relevant parts of the input. This has made them the go-to model for NLP tasks and inspired models like BERT and GPT.

Notes

This implementation is a simplified version to understand the core ideas. You can experiment with more layers, different hyperparameters, or even add masking for causal language modeling.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
transformer.ipynb		transformer.ipynb
transformer_model.py		transformer_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transformer Model Implementation in PyTorch

Objective

Components

Getting Started

Prerequisites

Installation

Why Transformers?

Notes

About

Uh oh!

Releases

Packages

Languages

mittapallynitin/Attention-Paper

Folders and files

Latest commit

History

Repository files navigation

Transformer Model Implementation in PyTorch

Objective

Components

Getting Started

Prerequisites

Installation

Why Transformers?

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages