Deep Learning from Scratch 🧠

Welcome to Deep Learning from Scratch, a repository where I implement fundamental deep learning architectures from scratch using Python, NumPy, and PyTorch. This project aims to provide a deeper understanding of how neural networks function internally, without relying on high-level libraries.

Important

Each topic highlighted in this repository is covered in a folder linked below.

In each folder, you'll find a copy of the critical papers related to the topic (.pdf files), along with my own breakdown of intuitions, math, and my implementation when relevant (all in the .ipynb file).

1. Deep Neural Networks

2. Optimization & Regularization

3. Sequence Modeling

4. Transformers

5. Image Generation

Paper shelf

1. Foundational Deep Neural Networks

Papers

DNN (1987): Learning Internal Representations by Error Propagation pdf
CNN (1989): Backpropagation Applied to Handwritten Zip Code Recognition pdf
LeNet (1998): Gradient-Based Learning Applied to Document Recognition pdf
AlexNet (2012): ImageNet Classification with Deep Convolutional Networks pdf
U-Net (2015): Convolutional Networks for Biomedical Image Segmentation pdf

2. Optimization and Regularization Techniques

Papers

Weight Decay (1991): A Simple Weight Decay Can Improve Generalization pdf
ReLU (2011): Deep Sparse Rectified Neural Networks pdf
Residuals (2015): Deep Residual Learning for Image Recognition pdf
Dropout (2014): Preventing Neural Networks from Overfitting pdf
BatchNorm (2015): Accelerating Deep Network Training pdf
LayerNorm (2016): Layer Normalization pdf
GELU (2016): Gaussian Error Linear Units pdf
Adam (2014): Stochastic Optimization Method pdf

3. Sequence Modeling

Papers

RNN (1989): Continually Running Fully Recurrent Neural Networks pdf
LSTM (1997): Long-Short Term Memory pdf
Learning to Forget (2000): Continual Prediction with LSTM pdf
Word2Vec (2013): Word Representations in Vector Space pdf
Phrase2Vec (2013): Distributed Representations of Words and Phrases pdf
Encoder-Decoder (2014): RNN Encoder-Decoder for Machine Translation pdf
Seq2Seq (2014): Sequence to Sequence Learning pdf
Attention (2014): Neural Machine Translation with Alignment pdf
Mixture of Experts (2017): Sparsely-Gated Neural Networks pdf

4. Language Modeling

Papers

Transformer (2017): Attention Is All You Need pdf
BERT (2018): Bidirectional Transformers for Language Understanding pdf
RoBERTa (2019): Robustly Optimized BERT Pretraining pdf
T5 (2019): Unified Text-to-Text Transformer pdf
GPT Series:
- GPT (2018): Generative Pre-Training pdf
- GPT-2 (2018): Unsupervised Multitask Learning pdf
- GPT-3 (2020): Few-Shot Learning pdf
- GPT-4 (2023): Advanced Language Model pdf
LoRA (2021): Low-Rank Adaptation of Large Language Models pdf
RLHF (2019): Fine-Tuning from Human Preferences pdf
InstructGPT (2022): Following Instructions with Human Feedback pdf
Vision Transformer (2020): Image Recognition with Transformers pdf
ELECTRA (2020): Discriminative Pre-training pdf

5. Image Generative Modeling

Papers

GAN (2014): Generative Adversarial Networks pdf
VAE (2013): Auto-Encoding Variational Bayes pdf
VQ VAE (2017): Neural Discrete Representation Learning pdf
Diffusion Models:
- Initial Diffusion (2015): Nonequilibrium Thermodynamics pdf
- Denoising Diffusion (2020): Probabilistic Models pdf
- Improved Denoising Diffusion (2021) pdf
CLIP (2021): Visual Models from Natural Language Supervision pdf
DALL-E (2021-2022): Text-to-Image Generation pdf
SimCLR (2020): Contrastive Learning of Visual Representations pdf

6. Deep Reinforcement Learning

Papers

Deep Reinforcement Learning (2017): Mastering Chess and Shogi pdf
Deep Q-Learning (2013): Playing Atari Games pdf
AlphaGo (2016): Mastering the Game of Go pdf
AlphaFold (2021): Protein Structure Prediction pdf

7. Additional Influential Papers

Deep Learning Survey (2015): By LeCun, Bengio, and Hinton pdf
BigGAN (2018): Large Scale GAN Training pdf
WaveNet (2016): Generative Model for Raw Audio pdf
BERTology (2020): Survey of BERT Use Cases pdf

Scaling and Model Optimization

Scaling Laws for Neural Language Models (2020): Predicting Model Performance pdf
Chinchilla (2022): Training Compute-Optimal Large Language Models pdf
Gopher (2022): Scaling Language Models with Massive Compute pdf

Fine-tuning and Adaptation

P-Tuning (2021): Prompt Tuning with Soft Prompts pdf
Prefix-Tuning (2021): Optimizing Continuous Prompts pdf
AdaLoRA (2023): Adaptive Low-Rank Adaptation pdf
QLoRA (2023): Efficient Fine-Tuning of Quantized Models pdf

Inference and Optimization Techniques

FlashAttention (2022): Fast and Memory-Efficient Attention pdf
FlashAttention-2 (2023): Faster Attention Mechanism pdf
Direct Preference Optimization (DPO) (2023): Aligning Language Models with Human Preferences pdf
LoRA (2021): Low-Rank Adaptation of Large Language Models pdf

Pre-training and Model Architecture

Mixture of Experts (MoE) (2022): Scaling Language Models with Sparse Experts pdf
GLaM (2021): Efficient Scaling with Mixture of Experts pdf
Switch Transformers (2022): Scaling to Trillion Parameter Models pdf

Reasoning and Capabilities

Chain of Thought Prompting (2022): Reasoning with Language Models pdf
Self-Consistency (2022): Improving Language Model Reasoning pdf
Tree of Thoughts (2023): Deliberate Problem Solving pdf

Efficiency and Compression

DistilBERT (2019): Distilled Version of BERT pdf
Knowledge Distillation (2022): Comprehensive Survey pdf
Pruning and Quantization Techniques (2022): Model Compression Survey pdf

🛠️ How to Use

Clone the repository:

git clone https://github.com/Ishaan-Ansari/Deep-Learning-from-scratch.git

Navigate to a specific model:

cd Deep-Learning-from-scratch/[Folder_Name]

Run Jupyter notebooks:
```
jupyter notebook
```
Follow the instructions within each notebook.

📌 Contributions & Feedback

This project is a work in progress! If you have suggestions, feel free to fork the repo, submit issues, or create pull requests.

⭐ If you find this helpful, star this repository and stay tuned for more updates!

This keeps it clean, structured, and informative. Let me know if you need modifications! 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
01-deep-neural-networks/01-dnn		01-deep-neural-networks/01-dnn
02-optimization-and-regularization		02-optimization-and-regularization
03-sequence-modelling/01-rnn		03-sequence-modelling/01-rnn
04-transformers/01-transformer		04-transformers/01-transformer
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Deep Learning from Scratch 🧠

Paper shelf

1. Foundational Deep Neural Networks

Papers

2. Optimization and Regularization Techniques

Papers

3. Sequence Modeling

Papers

4. Language Modeling

Papers

5. Image Generative Modeling

Papers

6. Deep Reinforcement Learning

Papers

7. Additional Influential Papers

Scaling and Model Optimization

Fine-tuning and Adaptation

Inference and Optimization Techniques

Pre-training and Model Architecture

Reasoning and Capabilities

Efficiency and Compression

🛠️ How to Use

📌 Contributions & Feedback

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

Ishaan-Ansari/Deep-Learning-from-scratch

Folders and files

Latest commit

History

Repository files navigation

Deep Learning from Scratch 🧠

Paper shelf

1. Foundational Deep Neural Networks

Papers

2. Optimization and Regularization Techniques

Papers

3. Sequence Modeling

Papers

4. Language Modeling

Papers

5. Image Generative Modeling

Papers

6. Deep Reinforcement Learning

Papers

7. Additional Influential Papers

Scaling and Model Optimization

Fine-tuning and Adaptation

Inference and Optimization Techniques

Pre-training and Model Architecture

Reasoning and Capabilities

Efficiency and Compression

🛠️ How to Use

📌 Contributions & Feedback

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages