Skip to content

Jaloch-glitch/micrograd

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

Micrograd - Neural Network Engine from Scratch

A lightweight autograd engine and neural network library built from scratch in Python. This project implements automatic differentiation (reverse-mode autodiff) and backpropagation to demonstrate the fundamental mechanics behind deep learning frameworks like PyTorch and TensorFlow.

Overview

Micrograd is an educational implementation that builds a complete neural network training system from first principles. It implements a scalar-valued autograd engine with dynamic computation graph construction, demonstrating how modern deep learning libraries work under the hood.

Features

  • Automatic Differentiation: Reverse-mode autodiff engine for computing gradients
  • Dynamic Computation Graphs: Build graphs on-the-fly during forward pass
  • Scalar Operations: All operations work on individual scalar values
  • Backpropagation: Complete implementation of gradient flow through networks
  • Neural Network Components: Neurons, layers, and multi-layer perceptrons (MLPs)
  • Visualization: Graphviz integration for computational graph visualization
  • Training Loop: Full training implementation with gradient descent

Project Structure

micrograd/
├── micrograd_from_scratch.ipynb    # Main implementation notebook
└── README.md                       # This file

Core Components

1. Value Class - The Autograd Engine

The Value class wraps scalar values and tracks operations for automatic differentiation:

class Value:
    def __init__(self, data, _children=(), _op='', label=''):
        self.data = data              # The actual scalar value
        self.grad = 0.0               # Gradient (derivative)
        self._backward = lambda: None  # Backward pass function
        self._prev = set(_children)    # Child nodes in computation graph
        self._op = _op                # Operation that created this node
        self.label = label            # Optional label for visualization

2. Supported Operations

  • Arithmetic: Addition (+), Subtraction (-), Multiplication (*), Division (/)
  • Power: Exponentiation (**)
  • Activation Functions: Hyperbolic tangent (tanh), Exponential (exp)
  • All operations support reverse-mode autodiff

3. Neural Network Architecture

# Neuron: Single computational unit
class Neuron:
    def __init__(self, nin):
        self.w = [Value(random.uniform(-1,1)) for _ in range(nin)]
        self.b = Value(random.uniform(-1,1))

# Layer: Collection of neurons
class Layer:
    def __init__(self, nin, nout):
        self.neurons = [Neuron(nin) for _ in range(nout)]

# MLP: Multi-layer perceptron
class MLP:
    def __init__(self, nin, nouts):
        sz = [nin] + nouts
        self.layers = [Layer(sz[i], sz[i+1]) for i in range(len(nouts))]

How It Works

Forward Pass

  1. Input flows through the network: Each operation creates a new Value node
  2. Computation graph is built: Nodes track their parents/children
  3. Output is computed: Final prediction is produced

Backward Pass (Backpropagation)

  1. Topological sort: Order nodes from output to inputs
  2. Initialize output gradient: Set output.grad = 1.0
  3. Propagate gradients: Call _backward() on each node in reverse order
  4. Chain rule applied: Gradients flow from output back to inputs

Gradient Descent Update

for p in network.parameters():
    p.data += -learning_rate * p.grad

Example Usage

Simple Expression

# Create values
a = Value(2.0, label='a')
b = Value(-3.0, label='b')
c = Value(10.0, label='c')

# Build expression: L = (a*b + c) * f
e = a * b; e.label = 'e'
d = e + c; d.label = 'd'
f = Value(-2.0, label='f')
L = d * f; L.label = 'L'

# Compute gradients
L.backward()

# Check gradients
print(f'dL/da = {a.grad}')  # 6.0
print(f'dL/db = {b.grad}')  # -4.0
print(f'dL/dc = {c.grad}')  # -2.0

Neural Network Training

# Create network: 3 inputs -> 4 hidden -> 4 hidden -> 1 output
n = MLP(3, [4, 4, 1])

# Training data (binary classifier)
xs = [
    [2.0, 3.0, -1.0],
    [3.0, -1.0, 0.5],
    [0.5, 1.0, 1.0],
    [1.0, 1.0, -1.0]
]
ys = [1.0, -1.0, -1.0, 1.0]  # Desired targets

# Training loop
for epoch in range(100):
    # Forward pass
    ypred = [n(x) for x in xs]
    loss = sum((yout - ygt)**2 for ygt, yout in zip(ys, ypred))

    # Backward pass
    for p in n.parameters():
        p.grad = 0.0
    loss.backward()

    # Update weights
    for p in n.parameters():
        p.data += -0.01 * p.grad

    print(f'Epoch {epoch}, Loss: {loss.data}')

Mathematical Foundations

Derivatives Implemented

Addition:

f(a, b) = a + b
∂f/∂a = 1, ∂f/∂b = 1

Multiplication:

f(a, b) = a * b
∂f/∂a = b, ∂f/∂b = a

Power:

f(a, n) = a^n
∂f/∂a = n * a^(n-1)

Tanh Activation:

f(x) = tanh(x) = (e^2x - 1)/(e^2x + 1)
∂f/∂x = 1 - tanh(x)^2

Exponential:

f(x) = e^x
∂f/∂x = e^x

Chain Rule

The foundation of backpropagation:

If z = f(y) and y = g(x), then:
∂z/∂x = (∂z/∂y) * (∂y/∂x)

Visualization

The notebook includes computational graph visualization using Graphviz:

draw_dot(L)  # Visualizes the entire computation graph

This creates a visual representation showing:

  • Nodes: Values with their data and gradients
  • Edges: Operations connecting values
  • Data flow: How values propagate through the network

Key Learning Outcomes

This project demonstrates:

  1. How autograd works: Building dynamic computation graphs
  2. Backpropagation mechanics: Manual implementation of reverse-mode autodiff
  3. Neural network fundamentals: Neurons, layers, and forward/backward passes
  4. Gradient descent: How weights are updated during training
  5. Computational graphs: Directed acyclic graphs for computation
  6. Topological sorting: Ordering nodes for correct gradient flow
  7. Chain rule application: How gradients compose through operations

Comparison with PyTorch

Micrograd implements the same core concepts as PyTorch:

Feature Micrograd PyTorch
Autograd ✅ Scalar-level ✅ Tensor-level
Dynamic graphs
Backpropagation ✅ Manual ✅ Optimized
Neural networks ✅ Basic MLP ✅ Full ecosystem
Optimization ✅ SGD ✅ Adam, RMSprop, etc.
Performance 🐌 Educational ⚡ Production-ready

PyTorch Verification

The notebook includes PyTorch comparisons to verify correctness:

import torch

# Micrograd computation
x = Value(2.0)
y = x ** 2
y.backward()
print(x.grad)  # 4.0

# PyTorch equivalent
x_torch = torch.Tensor([2.0]).requires_grad_(True)
y_torch = x_torch ** 2
y_torch.backward()
print(x_torch.grad)  # tensor([4.0])

Performance Benchmarks

Training a simple 3-4-4-1 network on 4 examples:

  • Initial loss: ~7.99
  • Final loss: ~0.046 (after 60 epochs)
  • Training time: <1 second for educational purposes

Requirements

  • Python 3.7+
  • NumPy
  • Matplotlib
  • Graphviz (for visualization)
  • Jupyter Notebook

Installation

# Clone the repository
git clone https://github.com/Jaloch-glitch/micrograd.git
cd micrograd

# Install dependencies
pip install numpy matplotlib graphviz jupyter

# For Graphviz visualization (system package)
# macOS:
brew install graphviz

# Ubuntu/Debian:
sudo apt-get install graphviz

Usage

# Start Jupyter Notebook
jupyter notebook micrograd_from_scratch.ipynb

Run all cells to:

  1. Understand automatic differentiation
  2. Build the Value class step-by-step
  3. Implement neural network components
  4. Train a binary classifier
  5. Visualize computation graphs

Code Highlights

Backward Pass Implementation

def backward(self):
    # Build topological order
    topo = []
    visited = set()

    def build_topo(v):
        if v not in visited:
            visited.add(v)
            for child in v._prev:
                build_topo(child)
            topo.append(v)

    build_topo(self)

    # Apply chain rule in reverse topological order
    self.grad = 1.0
    for node in reversed(topo):
        node._backward()

Multiplication Gradient

def __mul__(self, other):
    out = Value(self.data * other.data, (self, other), '*')

    def _backward():
        self.grad += other.data * out.grad   # Chain rule
        other.grad += self.data * out.grad
    out._backward = _backward

    return out

Educational Value

Perfect for learning:

  • Deep learning fundamentals without library abstractions
  • Automatic differentiation implementation details
  • Backpropagation algorithm step-by-step
  • Neural network architecture from scratch
  • Gradient descent optimization mechanics
  • Computational graph concepts visually

Advanced Topics Covered

  1. Operator overloading: Implementing __add__, __mul__, etc.
  2. Closure functions: Using _backward closures for gradient computation
  3. Topological sorting: Ensuring correct gradient flow order
  4. Graph algorithms: Building and traversing computation graphs
  5. Numerical differentiation: Verifying gradients with finite differences

Limitations

  • Scalar-only: No tensor/matrix operations (educational choice)
  • No optimizers: Only basic gradient descent
  • Limited activations: Only tanh and exp
  • No GPU support: CPU-only implementation
  • Performance: Much slower than production libraries

Future Enhancements

  • Add tensor support for matrix operations
  • Implement more activation functions (ReLU, sigmoid, softmax)
  • Add optimizers (Adam, RMSprop)
  • Support for convolutions
  • Batch processing
  • GPU acceleration
  • More loss functions
  • Regularization techniques

Inspiration

This project is inspired by Andrej Karpathy's micrograd, which teaches the fundamentals of neural networks by building an autograd engine from scratch.

Real-World Applications

While micrograd is educational, the concepts apply to:

  • Deep learning frameworks (PyTorch, TensorFlow, JAX)
  • Automatic differentiation libraries
  • Scientific computing (physics simulations, optimization)
  • Computational chemistry and biology
  • Robotics and control systems

Author

Felix Onyango

  • GitHub: @Jaloch-glitch
  • Location: Kenya, East Africa
  • Specialization: ML Engineering, Neural Networks, Deep Learning

License

This project is open source and available for educational purposes.

Acknowledgments

  • Inspired by Andrej Karpathy's micrograd tutorial
  • Mathematical foundations from deep learning literature
  • Graphviz for computational graph visualization
  • PyTorch team for reference implementations

Further Reading

  • Automatic Differentiation: How autograd engines work
  • Backpropagation Algorithm: Original paper and modern explanations
  • Computational Graphs: Theory and applications
  • Neural Network Fundamentals: From perceptrons to deep learning
  • Gradient-Based Optimization: SGD, momentum, and adaptive methods

Note: This is an educational project designed to teach neural network fundamentals. For production machine learning, use established frameworks like PyTorch, TensorFlow, or JAX.

Quick Start

# 1. Create a simple computation
a = Value(2.0, label='a')
b = Value(3.0, label='b')
c = a * b + Value(5.0)
c.label = 'c'

# 2. Compute gradients
c.backward()

# 3. Check results
print(f'c = {c.data}')      # 11.0
print(f'dc/da = {a.grad}')  # 3.0
print(f'dc/db = {b.grad}')  # 2.0

# 4. Visualize
draw_dot(c)

Start exploring neural networks from the ground up!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published