Skip to content

This repository provides a full PyTorch implementation of the “Attention Is All You Need” paper, recreating the original transformer architecture from scratch. It includes encoder-decoder modules, multi-head attention, positional encoding, and masking, making it a great reference for understanding the foundations of modern large language models.

Notifications You must be signed in to change notification settings

mittapallynitin/Attention-Paper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

Transformer Model Implementation in PyTorch

In this repository, I dive into the "Attention is All You Need" paper and work to understand Transformers in depth by implementing the main components from scratch using PyTorch.

Objective

The main goal is to break down and implement the core ideas of the Transformer model, including:

  • Self-Attention and Multi-Head Attention
  • Positional Encoding for sequence order information
  • Feed-Forward Layers and Layer Normalization
  • Stacked Encoder Layers as seen in the original architecture

Components

  1. Input Embeddings: Converts token IDs to embeddings.
  2. Positional Encoding: Adds positional context to embeddings.
  3. Self-Attention Mechanism: Computes the relationships between tokens.
  4. Multi-Head Attention: Uses multiple attention heads for richer representations.
  5. Feed-Forward Network: Processes the outputs from the attention layers.
  6. Encoder Layer: Combines attention, feed-forward, and normalization layers.
  7. Encoder: Stacks multiple encoder layers to build the final model.

Getting Started

Prerequisites

  • Python 3.x
  • PyTorch
  • (Optional) CUDA for faster training with GPUs

Installation

  1. Clone the repository:
    git clone https://github.com/yourusername/transformer-implementation.git

Why Transformers?

Transformers are powerful because they handle dependencies across entire sequences using attention mechanisms, allowing the model to focus on relevant parts of the input. This has made them the go-to model for NLP tasks and inspired models like BERT and GPT.

Notes

This implementation is a simplified version to understand the core ideas. You can experiment with more layers, different hyperparameters, or even add masking for causal language modeling.

About

This repository provides a full PyTorch implementation of the “Attention Is All You Need” paper, recreating the original transformer architecture from scratch. It includes encoder-decoder modules, multi-head attention, positional encoding, and masking, making it a great reference for understanding the foundations of modern large language models.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published