Skip to content

Latest commit

 

History

History
179 lines (133 loc) · 5.66 KB

File metadata and controls

179 lines (133 loc) · 5.66 KB

Deep Learning from Scratch

Python PyTorch NumPy License

An implementation of automatic differentiation and neural network architectures from scratch, demonstrating deep understanding of the foundations of these fundamental deep learning concepts.

Project Overview

This project implements two core components of modern deep learning:

1. Automatic Differentiation Engine (autograd/)

A complete reverse-mode autodiff system supporting:

  • Computation graph construction with lazy evaluation
  • Backpropagation through arbitrary graphs
  • Matrix operations: matmul, solve, logdet
  • Numerically stable functions: logsumexp (for softmax)
from autograd import Var, grad, matmul, inner, solve

# Define computation
def f(A, x, y):
    return inner(solve(A, x), matmul(A, y))

# Get gradient function automatically
grad_f = grad(f)
grads = grad_f(A, x, y)  # Returns [∂f/∂A, ∂f/∂x, ∂f/∂y]

2. Neural Network Architectures (neural_networks/)

Eight architectures from simple to complex:

Architecture Type Key Feature
Perceptron Linear Baseline classifier
Shallow MLP MLP Single hidden layer
Deep MLP MLP Vanishing gradient demo
Deep MLP + ReLU MLP ReLU vs tanh comparison
CNN ConvNet Basic convolutions
CNN + Dropout ConvNet Regularization
VGG-style Deep CNN Stacked 3x3 convs
ResNet Residual Skip connections

Custom optimizer implementations:

  • SGD: Vanilla stochastic gradient descent
  • SGD + Momentum: Accelerated convergence
  • Adam: Adaptive learning rates

Results

Training 8 architectures × 3 optimizers on CIFAR-10 (grayscale):

Test Errors

Key Findings

  1. Depth matters, but activation matters more: Deep MLP with tanh suffers from vanishing gradients; ReLU enables deeper networks
  2. Residual connections help: ResNet trains more stably than VGG despite similar depth
  3. Adam is robust: Works well across architectures with minimal tuning
  4. Dropout effect varies: Helps most when model is prone to overfitting

Quick Start

Installation

git clone https://github.com/yourusername/deep-learning-from-scratch.git
cd deep-learning-from-scratch
pip install -r requirements.txt

Run Autograd Demo

cd experiments
python autograd_demo.py

Train Neural Networks

cd experiments
python train_networks.py

This will:

  1. Download CIFAR-10 automatically
  2. Train all architectures with all optimizers
  3. Generate comparison plots in results/figures/

Project Structure

deep-learning-from-scratch/
├── autograd/
│   ├── __init__.py
│   └── engine.py          # Autodiff implementation
├── neural_networks/
│   ├── __init__.py
│   ├── architectures.py   # 8 network architectures
│   ├── optimizers.py      # SGD, Momentum, Adam
│   ├── data.py            # CIFAR-10 loading
│   └── training.py        # Training utilities
├── experiments/
│   ├── autograd_demo.py   # Autodiff demonstrations
│   └── train_networks.py  # Full training experiments
├── results/
│   └── figures/           # Generated plots
├── requirements.txt
└── README.md

Technical Details

Autodiff Engine

The autodiff engine implements reverse-mode automatic differentiation:

  1. Forward pass: Build computation graph, compute values
  2. Backward pass: Traverse in reverse topological order, accumulate gradients

Key operations and their gradients:

Operation Forward Backward (VJP)
add(x, y) x + y (u, u)
mul(x, y) x * y (u·y, u·x)
matmul(X, Y) XY (u·Yᵀ, Xᵀ·u)
solve(A, b) A⁻¹b (-A⁻ᵀu·xᵀ, A⁻ᵀu)
logdet(A) log|A| u·A⁻ᵀ
logsumexp(x) log Σeˣⁱ u·softmax(x)

Neural Network Training

All networks trained with:

  • Loss: Cross-entropy
  • Data: CIFAR-10 (grayscale, 32×32)
  • Epochs: 20
  • Batch size: 256
  • Initialization: Xavier/Glorot for CNNs

References

Automatic Differentiation:

  • Baydin et al., "Automatic Differentiation in Machine Learning: a Survey" (2018)
  • Griewank & Walther, "Evaluating Derivatives" (2008)

Neural Network Architectures:

  • LeCun et al., "Gradient-based learning applied to document recognition" (1998)
  • Simonyan & Zisserman, "Very Deep Convolutional Networks" (VGG, 2014)
  • He et al., "Deep Residual Learning for Image Recognition" (ResNet, 2015)

Optimization:

  • Robbins & Monro, "A Stochastic Approximation Method" (1951) - SGD
  • Polyak, "Some methods of speeding up convergence" (1964) - Momentum
  • Kingma & Ba, "Adam: A Method for Stochastic Optimization" (2014)

Weight Initialization:

  • Glorot & Bengio, "Understanding the difficulty of training deep feedforward neural networks" (2010)

Outcomes

Building this project demonstrates understanding of:

  1. Calculus & Linear Algebra: Deriving gradients for matrix operations
  2. Graph Algorithms: Topological sorting for backpropagation
  3. Numerical Computing: Stable implementations (logsumexp)
  4. Deep Learning Foundations: How frameworks like PyTorch work internally
  5. Optimization Theory: Momentum, adaptive learning rates
  6. CNN Architectures: Convolutions, pooling, residual connections

License

MIT License - feel free to use for learning and projects!