Skip to content

ROBERT-ADDO-ASANTE-DARKO/Transformer-from-Scratch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Transformer Model for Language Translation

This repository contains the code for training a transformer model for language translation. The model is based on the paper "Attention Is All You Need" by Vaswani et al. (2017).

Prerequisites

To run this code, you will need the following:

  • Python 3.6 or later
  • PyTorch 1.10 or later
  • Transformers library 4.25 or later
  • Hugging Face Tokenizers library 0.11 or later
  • TensorBoard

Installation

To install the required dependencies, run the following command:

pip install -r requirements.txt

Data

The code uses the WMT14 English-German dataset for training. The dataset can be downloaded from the following link:

WMT14 English-German dataset

Once downloaded, extract the dataset to a folder named data.

Training

To train the model, run the following command:

python train.py

The training process will take several hours. The model will be saved to the models folder.

Evaluation

To evaluate the model, run the following command:

python evaluate.py

The evaluation process will take a few minutes. The results will be printed to the console.

Usage

The trained model can be used to translate text from English to German. To do this, run the following command:

python translate.py

The translated text will be printed to the console.

Code Overview

The code is organized into the following modules:

  • model.py: This module contains the definition of the transformer model.
  • dataset.py: This module contains the definition of the dataset class.
  • config.py: This module contains the definition of the configuration class.
  • train.py: This module contains the code for training the model.
  • evaluate.py: This module contains the code for evaluating the model.
  • translate.py: This module contains the code for using the model to translate text.

Model

The transformer model is defined in the model.py module. The model consists of an encoder and a decoder. The encoder converts the input sequence of tokens into a sequence of vectors. The decoder then uses the output of the encoder to the target sequence of tokens.

The encoder and decoder are both composed of a stack of identical layers. Each layer consists of a multi-head self-attention mechanism, followed by a position-wise fully connected feed-forward network. The self-attention mechanism the model to consider the context of each token in the input sequence when encoding it. The feed-forward network allows the model to learn complex non-linear transformations of the input vectors.

The transformer model also includes several other components, such as positional encoding, which is added to the input tokens to give the model information about the position of each token in the sequence. The model also includes a source and target vocabulary, which are used to convert the input and output sequences of tokens into vectors that can be processed by the model.

Dataset

The dataset class is defined in the dataset.py module. The dataset class is responsible for loading and preprocessing the training and evaluation data. The dataset class includes methods for tokenizing the input and output sequences, adding positional encoding, and batching the data for training.

The dataset class also includes methods for loading the data from a variety of sources, such as text files, CSV files, or SQL databases. This allows the model to be trained on a wide range of data types and formats.

Configuration

The configuration class is defined in the config.py module. The configuration class is used to store the hyperparameters and other configuration options for the model. This includes options such as the number of layers in the encoder and decoder, the size of the hidden state, and the type of attention mechanism to use.

The configuration class also includes methods for loading and saving the configuration to a file. This allows the configuration to be easily shared and reused across different experiments.

Training

The training code is defined in the train.py module. The training code includes methods for training the model, evaluating the model, and saving the trained model to a file.

The training code uses the dataset class to load and preprocess the training data. The training code also includes methods for computing the loss and gradients, and for updating the model parameters.

The training code also includes methods for saving the trained model to a file, and for loading the trained model from a file. This allows the trained model to be easily reused in other applications.

Evaluation

The evaluation code is defined in the evaluate.py module. The evaluation code includes methods for evaluating the model on a variety of tasks, such as translation, summarization, and question answering.

The evaluation code uses the dataset class to load and preprocess the evaluation data. The evaluation code also includes methods for computing the evaluation metrics, such as BLEU score for translation or ROUGE score for summarization.

Translation

The translation code is defined in the translate.py module. The translation code includes methods for using the trained model to translate text from one language to another.

The translation code uses the dataset class to tokenize the input text and add positional encoding. The translation code then feeds the encoded input through the model to generate the target sequence of tokens. The translation code also includes methods for detokenizing the output sequence and postprocessing the translation to improve fluency and accuracy.

Overall, the transformers library provides a powerful and flexible framework for building and training transformer models. The library includes a wide range of components and tools for loading and preprocessing data, training and evaluating models, and using the trained models for a variety of tasks. The library also includes a large number of pre-trained models that can be fine-tuned for specific tasks, making it easy to get started with transformer models.

About

Building a transformer model from scratch for sequence modeling

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages