Micrograd - Neural Network Engine from Scratch

A lightweight autograd engine and neural network library built from scratch in Python. This project implements automatic differentiation (reverse-mode autodiff) and backpropagation to demonstrate the fundamental mechanics behind deep learning frameworks like PyTorch and TensorFlow.

Overview

Micrograd is an educational implementation that builds a complete neural network training system from first principles. It implements a scalar-valued autograd engine with dynamic computation graph construction, demonstrating how modern deep learning libraries work under the hood.

Features

Automatic Differentiation: Reverse-mode autodiff engine for computing gradients
Dynamic Computation Graphs: Build graphs on-the-fly during forward pass
Scalar Operations: All operations work on individual scalar values
Backpropagation: Complete implementation of gradient flow through networks
Neural Network Components: Neurons, layers, and multi-layer perceptrons (MLPs)
Visualization: Graphviz integration for computational graph visualization
Training Loop: Full training implementation with gradient descent

Project Structure

micrograd/
├── micrograd_from_scratch.ipynb    # Main implementation notebook
└── README.md                       # This file

Core Components

1. Value Class - The Autograd Engine

The Value class wraps scalar values and tracks operations for automatic differentiation:

class Value:
    def __init__(self, data, _children=(), _op='', label=''):
        self.data = data              # The actual scalar value
        self.grad = 0.0               # Gradient (derivative)
        self._backward = lambda: None  # Backward pass function
        self._prev = set(_children)    # Child nodes in computation graph
        self._op = _op                # Operation that created this node
        self.label = label            # Optional label for visualization

2. Supported Operations

Arithmetic: Addition (+), Subtraction (-), Multiplication (*), Division (/)
Power: Exponentiation (**)
Activation Functions: Hyperbolic tangent (tanh), Exponential (exp)
All operations support reverse-mode autodiff

3. Neural Network Architecture

# Neuron: Single computational unit
class Neuron:
    def __init__(self, nin):
        self.w = [Value(random.uniform(-1,1)) for _ in range(nin)]
        self.b = Value(random.uniform(-1,1))

# Layer: Collection of neurons
class Layer:
    def __init__(self, nin, nout):
        self.neurons = [Neuron(nin) for _ in range(nout)]

# MLP: Multi-layer perceptron
class MLP:
    def __init__(self, nin, nouts):
        sz = [nin] + nouts
        self.layers = [Layer(sz[i], sz[i+1]) for i in range(len(nouts))]

How It Works

Forward Pass

Input flows through the network: Each operation creates a new Value node
Computation graph is built: Nodes track their parents/children
Output is computed: Final prediction is produced

Backward Pass (Backpropagation)

Topological sort: Order nodes from output to inputs
Initialize output gradient: Set output.grad = 1.0
Propagate gradients: Call _backward() on each node in reverse order
Chain rule applied: Gradients flow from output back to inputs

Gradient Descent Update

for p in network.parameters():
    p.data += -learning_rate * p.grad

Example Usage

Simple Expression

# Create values
a = Value(2.0, label='a')
b = Value(-3.0, label='b')
c = Value(10.0, label='c')

# Build expression: L = (a*b + c) * f
e = a * b; e.label = 'e'
d = e + c; d.label = 'd'
f = Value(-2.0, label='f')
L = d * f; L.label = 'L'

# Compute gradients
L.backward()

# Check gradients
print(f'dL/da = {a.grad}')  # 6.0
print(f'dL/db = {b.grad}')  # -4.0
print(f'dL/dc = {c.grad}')  # -2.0

Neural Network Training

# Create network: 3 inputs -> 4 hidden -> 4 hidden -> 1 output
n = MLP(3, [4, 4, 1])

# Training data (binary classifier)
xs = [
    [2.0, 3.0, -1.0],
    [3.0, -1.0, 0.5],
    [0.5, 1.0, 1.0],
    [1.0, 1.0, -1.0]
]
ys = [1.0, -1.0, -1.0, 1.0]  # Desired targets

# Training loop
for epoch in range(100):
    # Forward pass
    ypred = [n(x) for x in xs]
    loss = sum((yout - ygt)**2 for ygt, yout in zip(ys, ypred))

    # Backward pass
    for p in n.parameters():
        p.grad = 0.0
    loss.backward()

    # Update weights
    for p in n.parameters():
        p.data += -0.01 * p.grad

    print(f'Epoch {epoch}, Loss: {loss.data}')

Mathematical Foundations

Derivatives Implemented

Addition:

f(a, b) = a + b
∂f/∂a = 1, ∂f/∂b = 1

Multiplication:

f(a, b) = a * b
∂f/∂a = b, ∂f/∂b = a

Power:

f(a, n) = a^n
∂f/∂a = n * a^(n-1)

Tanh Activation:

f(x) = tanh(x) = (e^2x - 1)/(e^2x + 1)
∂f/∂x = 1 - tanh(x)^2

Exponential:

f(x) = e^x
∂f/∂x = e^x

Chain Rule

The foundation of backpropagation:

If z = f(y) and y = g(x), then:
∂z/∂x = (∂z/∂y) * (∂y/∂x)

Visualization

The notebook includes computational graph visualization using Graphviz:

draw_dot(L)  # Visualizes the entire computation graph

This creates a visual representation showing:

Nodes: Values with their data and gradients
Edges: Operations connecting values
Data flow: How values propagate through the network

Key Learning Outcomes

This project demonstrates:

How autograd works: Building dynamic computation graphs
Backpropagation mechanics: Manual implementation of reverse-mode autodiff
Neural network fundamentals: Neurons, layers, and forward/backward passes
Gradient descent: How weights are updated during training
Computational graphs: Directed acyclic graphs for computation
Topological sorting: Ordering nodes for correct gradient flow
Chain rule application: How gradients compose through operations

Comparison with PyTorch

Micrograd implements the same core concepts as PyTorch:

Feature	Micrograd	PyTorch
Autograd	✅ Scalar-level	✅ Tensor-level
Dynamic graphs	✅	✅
Backpropagation	✅ Manual	✅ Optimized
Neural networks	✅ Basic MLP	✅ Full ecosystem
Optimization	✅ SGD	✅ Adam, RMSprop, etc.
Performance	🐌 Educational	⚡ Production-ready

PyTorch Verification

The notebook includes PyTorch comparisons to verify correctness:

import torch

# Micrograd computation
x = Value(2.0)
y = x ** 2
y.backward()
print(x.grad)  # 4.0

# PyTorch equivalent
x_torch = torch.Tensor([2.0]).requires_grad_(True)
y_torch = x_torch ** 2
y_torch.backward()
print(x_torch.grad)  # tensor([4.0])

Performance Benchmarks

Training a simple 3-4-4-1 network on 4 examples:

Initial loss: ~7.99
Final loss: ~0.046 (after 60 epochs)
Training time: <1 second for educational purposes

Requirements

Python 3.7+
NumPy
Matplotlib
Graphviz (for visualization)
Jupyter Notebook

Installation

# Clone the repository
git clone https://github.com/Jaloch-glitch/micrograd.git
cd micrograd

# Install dependencies
pip install numpy matplotlib graphviz jupyter

# For Graphviz visualization (system package)
# macOS:
brew install graphviz

# Ubuntu/Debian:
sudo apt-get install graphviz

Usage

# Start Jupyter Notebook
jupyter notebook micrograd_from_scratch.ipynb

Run all cells to:

Understand automatic differentiation
Build the Value class step-by-step
Implement neural network components
Train a binary classifier
Visualize computation graphs

Code Highlights

Backward Pass Implementation

def backward(self):
    # Build topological order
    topo = []
    visited = set()

    def build_topo(v):
        if v not in visited:
            visited.add(v)
            for child in v._prev:
                build_topo(child)
            topo.append(v)

    build_topo(self)

    # Apply chain rule in reverse topological order
    self.grad = 1.0
    for node in reversed(topo):
        node._backward()

Multiplication Gradient

def __mul__(self, other):
    out = Value(self.data * other.data, (self, other), '*')

    def _backward():
        self.grad += other.data * out.grad   # Chain rule
        other.grad += self.data * out.grad
    out._backward = _backward

    return out

Educational Value

Perfect for learning:

Deep learning fundamentals without library abstractions
Automatic differentiation implementation details
Backpropagation algorithm step-by-step
Neural network architecture from scratch
Gradient descent optimization mechanics
Computational graph concepts visually

Advanced Topics Covered

Operator overloading: Implementing __add__, __mul__, etc.
Closure functions: Using _backward closures for gradient computation
Topological sorting: Ensuring correct gradient flow order
Graph algorithms: Building and traversing computation graphs
Numerical differentiation: Verifying gradients with finite differences

Limitations

Scalar-only: No tensor/matrix operations (educational choice)
No optimizers: Only basic gradient descent
Limited activations: Only tanh and exp
No GPU support: CPU-only implementation
Performance: Much slower than production libraries

Future Enhancements

Add tensor support for matrix operations
Implement more activation functions (ReLU, sigmoid, softmax)
Add optimizers (Adam, RMSprop)
Support for convolutions
Batch processing
GPU acceleration
More loss functions
Regularization techniques

Inspiration

This project is inspired by Andrej Karpathy's micrograd, which teaches the fundamentals of neural networks by building an autograd engine from scratch.

Real-World Applications

While micrograd is educational, the concepts apply to:

Deep learning frameworks (PyTorch, TensorFlow, JAX)
Automatic differentiation libraries
Scientific computing (physics simulations, optimization)
Computational chemistry and biology
Robotics and control systems

Author

Felix Onyango

GitHub: @Jaloch-glitch
Location: Kenya, East Africa
Specialization: ML Engineering, Neural Networks, Deep Learning

License

This project is open source and available for educational purposes.

Acknowledgments

Inspired by Andrej Karpathy's micrograd tutorial
Mathematical foundations from deep learning literature
Graphviz for computational graph visualization
PyTorch team for reference implementations

Quick Start

# 1. Create a simple computation
a = Value(2.0, label='a')
b = Value(3.0, label='b')
c = a * b + Value(5.0)
c.label = 'c'

# 2. Compute gradients
c.backward()

# 3. Check results
print(f'c = {c.data}')      # 11.0
print(f'dc/da = {a.grad}')  # 3.0
print(f'dc/db = {b.grad}')  # 2.0

# 4. Visualize
draw_dot(c)

Start exploring neural networks from the ground up!

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.ipynb_checkpoints		.ipynb_checkpoints
README.md		README.md
micrograd_from_scratch.ipynb		micrograd_from_scratch.ipynb

Jaloch-glitch/micrograd

Folders and files

Latest commit

History

Repository files navigation