Skip to content

ngvananh2508/GPT2_reproduction_optimized

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Reproducing GPT-2 with Optimized Training Strategies

This project implements the GPT-2 model with advanced optimization strategies for efficient and stable training, including:

  • Careful parameter and hyperparameter initialization for stable convergence.
  • Leveraging GPU internal matrix multiplication architecture (4x4).
  • Integration of Flash Attention for faster and more memory-efficient attention computation.
  • Optimizer configuration with hyperparameter tuning and kernel fusion for improved performance.
  • Parameter datatype casting for optimal GPU computation.
  • Distributed Data Parallel (DDP) training across multiple GPUs.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages