Deep Learning from Scratch

An implementation of automatic differentiation and neural network architectures from scratch, demonstrating deep understanding of the foundations of these fundamental deep learning concepts.

Project Overview

This project implements two core components of modern deep learning:

1. Automatic Differentiation Engine (`autograd/`)

A complete reverse-mode autodiff system supporting:

Computation graph construction with lazy evaluation
Backpropagation through arbitrary graphs
Matrix operations: matmul, solve, logdet
Numerically stable functions: logsumexp (for softmax)

from autograd import Var, grad, matmul, inner, solve

# Define computation
def f(A, x, y):
    return inner(solve(A, x), matmul(A, y))

# Get gradient function automatically
grad_f = grad(f)
grads = grad_f(A, x, y)  # Returns [∂f/∂A, ∂f/∂x, ∂f/∂y]

2. Neural Network Architectures (`neural_networks/`)

Eight architectures from simple to complex:

Architecture	Type	Key Feature
Perceptron	Linear	Baseline classifier
Shallow MLP	MLP	Single hidden layer
Deep MLP	MLP	Vanishing gradient demo
Deep MLP + ReLU	MLP	ReLU vs tanh comparison
CNN	ConvNet	Basic convolutions
CNN + Dropout	ConvNet	Regularization
VGG-style	Deep CNN	Stacked 3x3 convs
ResNet	Residual	Skip connections

Custom optimizer implementations:

SGD: Vanilla stochastic gradient descent
SGD + Momentum: Accelerated convergence
Adam: Adaptive learning rates

Results

Training 8 architectures × 3 optimizers on CIFAR-10 (grayscale):

Key Findings

Depth matters, but activation matters more: Deep MLP with tanh suffers from vanishing gradients; ReLU enables deeper networks
Residual connections help: ResNet trains more stably than VGG despite similar depth
Adam is robust: Works well across architectures with minimal tuning
Dropout effect varies: Helps most when model is prone to overfitting

Quick Start

Installation

git clone https://github.com/yourusername/deep-learning-from-scratch.git
cd deep-learning-from-scratch
pip install -r requirements.txt

Run Autograd Demo

cd experiments
python autograd_demo.py

Train Neural Networks

cd experiments
python train_networks.py

This will:

Download CIFAR-10 automatically
Train all architectures with all optimizers
Generate comparison plots in results/figures/

Project Structure

deep-learning-from-scratch/
├── autograd/
│   ├── __init__.py
│   └── engine.py          # Autodiff implementation
├── neural_networks/
│   ├── __init__.py
│   ├── architectures.py   # 8 network architectures
│   ├── optimizers.py      # SGD, Momentum, Adam
│   ├── data.py            # CIFAR-10 loading
│   └── training.py        # Training utilities
├── experiments/
│   ├── autograd_demo.py   # Autodiff demonstrations
│   └── train_networks.py  # Full training experiments
├── results/
│   └── figures/           # Generated plots
├── requirements.txt
└── README.md

Technical Details

Autodiff Engine

The autodiff engine implements reverse-mode automatic differentiation:

Forward pass: Build computation graph, compute values
Backward pass: Traverse in reverse topological order, accumulate gradients

Key operations and their gradients:

Operation	Forward	Backward (VJP)
`add(x, y)`	x + y	(u, u)
`mul(x, y)`	x * y	(u·y, u·x)
`matmul(X, Y)`	XY	(u·Yᵀ, Xᵀ·u)
`solve(A, b)`	A⁻¹b	(-A⁻ᵀu·xᵀ, A⁻ᵀu)
`logdet(A)`	log\|A\|	u·A⁻ᵀ
`logsumexp(x)`	log Σeˣⁱ	u·softmax(x)

Neural Network Training

All networks trained with:

Loss: Cross-entropy
Data: CIFAR-10 (grayscale, 32×32)
Epochs: 20
Batch size: 256
Initialization: Xavier/Glorot for CNNs

References

Automatic Differentiation:

Baydin et al., "Automatic Differentiation in Machine Learning: a Survey" (2018)
Griewank & Walther, "Evaluating Derivatives" (2008)

Neural Network Architectures:

LeCun et al., "Gradient-based learning applied to document recognition" (1998)
Simonyan & Zisserman, "Very Deep Convolutional Networks" (VGG, 2014)
He et al., "Deep Residual Learning for Image Recognition" (ResNet, 2015)

Optimization:

Robbins & Monro, "A Stochastic Approximation Method" (1951) - SGD
Polyak, "Some methods of speeding up convergence" (1964) - Momentum
Kingma & Ba, "Adam: A Method for Stochastic Optimization" (2014)

Weight Initialization:

Glorot & Bengio, "Understanding the difficulty of training deep feedforward neural networks" (2010)

Outcomes

Building this project demonstrates understanding of:

Calculus & Linear Algebra: Deriving gradients for matrix operations
Graph Algorithms: Topological sorting for backpropagation
Numerical Computing: Stable implementations (logsumexp)
Deep Learning Foundations: How frameworks like PyTorch work internally
Optimization Theory: Momentum, adaptive learning rates
CNN Architectures: Convolutions, pooling, residual connections

License

MIT License - feel free to use for learning and projects!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deep Learning from Scratch

Project Overview

1. Automatic Differentiation Engine (`autograd/`)

2. Neural Network Architectures (`neural_networks/`)

Results

Key Findings

Quick Start

Installation

Run Autograd Demo

Train Neural Networks

Project Structure

Technical Details

Autodiff Engine

Neural Network Training

References

Outcomes

License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Deep Learning from Scratch

Project Overview

1. Automatic Differentiation Engine (autograd/)

2. Neural Network Architectures (neural_networks/)

Results

Key Findings

Quick Start

Installation

Run Autograd Demo

Train Neural Networks

Project Structure

Technical Details

Autodiff Engine

Neural Network Training

References

Outcomes

License

1. Automatic Differentiation Engine (`autograd/`)

2. Neural Network Architectures (`neural_networks/`)