Skip to content

Ishaan-Ansari/Deep-Learning-from-scratch

Repository files navigation

Deep Learning from Scratch 🧠

Welcome to Deep Learning from Scratch, a repository where I implement fundamental deep learning architectures from scratch using Python, NumPy, and PyTorch. This project aims to provide a deeper understanding of how neural networks function internally, without relying on high-level libraries.

Important

Each topic highlighted in this repository is covered in a folder linked below.

In each folder, you'll find a copy of the critical papers related to the topic (.pdf files), along with my own breakdown of intuitions, math, and my implementation when relevant (all in the .ipynb file).

1. Deep Neural Networks

2. Optimization & Regularization

3. Sequence Modeling

4. Transformers

5. Image Generation


Paper shelf

1. Foundational Deep Neural Networks

Papers

  • DNN (1987): Learning Internal Representations by Error Propagation pdf
  • CNN (1989): Backpropagation Applied to Handwritten Zip Code Recognition pdf
  • LeNet (1998): Gradient-Based Learning Applied to Document Recognition pdf
  • AlexNet (2012): ImageNet Classification with Deep Convolutional Networks pdf
  • U-Net (2015): Convolutional Networks for Biomedical Image Segmentation pdf

2. Optimization and Regularization Techniques

Papers

  • Weight Decay (1991): A Simple Weight Decay Can Improve Generalization pdf
  • ReLU (2011): Deep Sparse Rectified Neural Networks pdf
  • Residuals (2015): Deep Residual Learning for Image Recognition pdf
  • Dropout (2014): Preventing Neural Networks from Overfitting pdf
  • BatchNorm (2015): Accelerating Deep Network Training pdf
  • LayerNorm (2016): Layer Normalization pdf
  • GELU (2016): Gaussian Error Linear Units pdf
  • Adam (2014): Stochastic Optimization Method pdf

3. Sequence Modeling

Papers

  • RNN (1989): Continually Running Fully Recurrent Neural Networks pdf
  • LSTM (1997): Long-Short Term Memory pdf
  • Learning to Forget (2000): Continual Prediction with LSTM pdf
  • Word2Vec (2013): Word Representations in Vector Space pdf
  • Phrase2Vec (2013): Distributed Representations of Words and Phrases pdf
  • Encoder-Decoder (2014): RNN Encoder-Decoder for Machine Translation pdf
  • Seq2Seq (2014): Sequence to Sequence Learning pdf
  • Attention (2014): Neural Machine Translation with Alignment pdf
  • Mixture of Experts (2017): Sparsely-Gated Neural Networks pdf

4. Language Modeling

Papers

  • Transformer (2017): Attention Is All You Need pdf
  • BERT (2018): Bidirectional Transformers for Language Understanding pdf
  • RoBERTa (2019): Robustly Optimized BERT Pretraining pdf
  • T5 (2019): Unified Text-to-Text Transformer pdf
  • GPT Series:
    • GPT (2018): Generative Pre-Training pdf
    • GPT-2 (2018): Unsupervised Multitask Learning pdf
    • GPT-3 (2020): Few-Shot Learning pdf
    • GPT-4 (2023): Advanced Language Model pdf
  • LoRA (2021): Low-Rank Adaptation of Large Language Models pdf
  • RLHF (2019): Fine-Tuning from Human Preferences pdf
  • InstructGPT (2022): Following Instructions with Human Feedback pdf
  • Vision Transformer (2020): Image Recognition with Transformers pdf
  • ELECTRA (2020): Discriminative Pre-training pdf

5. Image Generative Modeling

Papers

  • GAN (2014): Generative Adversarial Networks pdf
  • VAE (2013): Auto-Encoding Variational Bayes pdf
  • VQ VAE (2017): Neural Discrete Representation Learning pdf
  • Diffusion Models:
    • Initial Diffusion (2015): Nonequilibrium Thermodynamics pdf
    • Denoising Diffusion (2020): Probabilistic Models pdf
    • Improved Denoising Diffusion (2021) pdf
  • CLIP (2021): Visual Models from Natural Language Supervision pdf
  • DALL-E (2021-2022): Text-to-Image Generation pdf
  • SimCLR (2020): Contrastive Learning of Visual Representations pdf

6. Deep Reinforcement Learning

Papers

  • Deep Reinforcement Learning (2017): Mastering Chess and Shogi pdf
  • Deep Q-Learning (2013): Playing Atari Games pdf
  • AlphaGo (2016): Mastering the Game of Go pdf
  • AlphaFold (2021): Protein Structure Prediction pdf

7. Additional Influential Papers

  • Deep Learning Survey (2015): By LeCun, Bengio, and Hinton pdf
  • BigGAN (2018): Large Scale GAN Training pdf
  • WaveNet (2016): Generative Model for Raw Audio pdf
  • BERTology (2020): Survey of BERT Use Cases pdf

Scaling and Model Optimization

  • Scaling Laws for Neural Language Models (2020): Predicting Model Performance pdf
  • Chinchilla (2022): Training Compute-Optimal Large Language Models pdf
  • Gopher (2022): Scaling Language Models with Massive Compute pdf

Fine-tuning and Adaptation

  • P-Tuning (2021): Prompt Tuning with Soft Prompts pdf
  • Prefix-Tuning (2021): Optimizing Continuous Prompts pdf
  • AdaLoRA (2023): Adaptive Low-Rank Adaptation pdf
  • QLoRA (2023): Efficient Fine-Tuning of Quantized Models pdf

Inference and Optimization Techniques

  • FlashAttention (2022): Fast and Memory-Efficient Attention pdf
  • FlashAttention-2 (2023): Faster Attention Mechanism pdf
  • Direct Preference Optimization (DPO) (2023): Aligning Language Models with Human Preferences pdf
  • LoRA (2021): Low-Rank Adaptation of Large Language Models pdf

Pre-training and Model Architecture

  • Mixture of Experts (MoE) (2022): Scaling Language Models with Sparse Experts pdf
  • GLaM (2021): Efficient Scaling with Mixture of Experts pdf
  • Switch Transformers (2022): Scaling to Trillion Parameter Models pdf

Reasoning and Capabilities

  • Chain of Thought Prompting (2022): Reasoning with Language Models pdf
  • Self-Consistency (2022): Improving Language Model Reasoning pdf
  • Tree of Thoughts (2023): Deliberate Problem Solving pdf

Efficiency and Compression

  • DistilBERT (2019): Distilled Version of BERT pdf
  • Knowledge Distillation (2022): Comprehensive Survey pdf
  • Pruning and Quantization Techniques (2022): Model Compression Survey pdf

πŸ› οΈ How to Use

  1. Clone the repository:
    git clone https://github.com/Ishaan-Ansari/Deep-Learning-from-scratch.git
  2. Navigate to a specific model:
    cd Deep-Learning-from-scratch/[Folder_Name]
  3. Run Jupyter notebooks:
    jupyter notebook
  4. Follow the instructions within each notebook.

πŸ“Œ Contributions & Feedback

This project is a work in progress! If you have suggestions, feel free to fork the repo, submit issues, or create pull requests.

⭐ If you find this helpful, star this repository and stay tuned for more updates!


This keeps it clean, structured, and informative. Let me know if you need modifications! πŸš€

About

Implementing deep learning architectures from scratch

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published