A lightweight autograd engine and neural network library built from scratch in Python. This project implements automatic differentiation (reverse-mode autodiff) and backpropagation to demonstrate the fundamental mechanics behind deep learning frameworks like PyTorch and TensorFlow.
Micrograd is an educational implementation that builds a complete neural network training system from first principles. It implements a scalar-valued autograd engine with dynamic computation graph construction, demonstrating how modern deep learning libraries work under the hood.
- Automatic Differentiation: Reverse-mode autodiff engine for computing gradients
- Dynamic Computation Graphs: Build graphs on-the-fly during forward pass
- Scalar Operations: All operations work on individual scalar values
- Backpropagation: Complete implementation of gradient flow through networks
- Neural Network Components: Neurons, layers, and multi-layer perceptrons (MLPs)
- Visualization: Graphviz integration for computational graph visualization
- Training Loop: Full training implementation with gradient descent
micrograd/
├── micrograd_from_scratch.ipynb # Main implementation notebook
└── README.md # This file
The Value class wraps scalar values and tracks operations for automatic differentiation:
class Value:
def __init__(self, data, _children=(), _op='', label=''):
self.data = data # The actual scalar value
self.grad = 0.0 # Gradient (derivative)
self._backward = lambda: None # Backward pass function
self._prev = set(_children) # Child nodes in computation graph
self._op = _op # Operation that created this node
self.label = label # Optional label for visualization- Arithmetic: Addition (
+), Subtraction (-), Multiplication (*), Division (/) - Power: Exponentiation (
**) - Activation Functions: Hyperbolic tangent (
tanh), Exponential (exp) - All operations support reverse-mode autodiff
# Neuron: Single computational unit
class Neuron:
def __init__(self, nin):
self.w = [Value(random.uniform(-1,1)) for _ in range(nin)]
self.b = Value(random.uniform(-1,1))
# Layer: Collection of neurons
class Layer:
def __init__(self, nin, nout):
self.neurons = [Neuron(nin) for _ in range(nout)]
# MLP: Multi-layer perceptron
class MLP:
def __init__(self, nin, nouts):
sz = [nin] + nouts
self.layers = [Layer(sz[i], sz[i+1]) for i in range(len(nouts))]- Input flows through the network: Each operation creates a new
Valuenode - Computation graph is built: Nodes track their parents/children
- Output is computed: Final prediction is produced
- Topological sort: Order nodes from output to inputs
- Initialize output gradient: Set
output.grad = 1.0 - Propagate gradients: Call
_backward()on each node in reverse order - Chain rule applied: Gradients flow from output back to inputs
for p in network.parameters():
p.data += -learning_rate * p.grad# Create values
a = Value(2.0, label='a')
b = Value(-3.0, label='b')
c = Value(10.0, label='c')
# Build expression: L = (a*b + c) * f
e = a * b; e.label = 'e'
d = e + c; d.label = 'd'
f = Value(-2.0, label='f')
L = d * f; L.label = 'L'
# Compute gradients
L.backward()
# Check gradients
print(f'dL/da = {a.grad}') # 6.0
print(f'dL/db = {b.grad}') # -4.0
print(f'dL/dc = {c.grad}') # -2.0# Create network: 3 inputs -> 4 hidden -> 4 hidden -> 1 output
n = MLP(3, [4, 4, 1])
# Training data (binary classifier)
xs = [
[2.0, 3.0, -1.0],
[3.0, -1.0, 0.5],
[0.5, 1.0, 1.0],
[1.0, 1.0, -1.0]
]
ys = [1.0, -1.0, -1.0, 1.0] # Desired targets
# Training loop
for epoch in range(100):
# Forward pass
ypred = [n(x) for x in xs]
loss = sum((yout - ygt)**2 for ygt, yout in zip(ys, ypred))
# Backward pass
for p in n.parameters():
p.grad = 0.0
loss.backward()
# Update weights
for p in n.parameters():
p.data += -0.01 * p.grad
print(f'Epoch {epoch}, Loss: {loss.data}')Addition:
f(a, b) = a + b
∂f/∂a = 1, ∂f/∂b = 1
Multiplication:
f(a, b) = a * b
∂f/∂a = b, ∂f/∂b = a
Power:
f(a, n) = a^n
∂f/∂a = n * a^(n-1)
Tanh Activation:
f(x) = tanh(x) = (e^2x - 1)/(e^2x + 1)
∂f/∂x = 1 - tanh(x)^2
Exponential:
f(x) = e^x
∂f/∂x = e^x
The foundation of backpropagation:
If z = f(y) and y = g(x), then:
∂z/∂x = (∂z/∂y) * (∂y/∂x)
The notebook includes computational graph visualization using Graphviz:
draw_dot(L) # Visualizes the entire computation graphThis creates a visual representation showing:
- Nodes: Values with their data and gradients
- Edges: Operations connecting values
- Data flow: How values propagate through the network
This project demonstrates:
- How autograd works: Building dynamic computation graphs
- Backpropagation mechanics: Manual implementation of reverse-mode autodiff
- Neural network fundamentals: Neurons, layers, and forward/backward passes
- Gradient descent: How weights are updated during training
- Computational graphs: Directed acyclic graphs for computation
- Topological sorting: Ordering nodes for correct gradient flow
- Chain rule application: How gradients compose through operations
Micrograd implements the same core concepts as PyTorch:
| Feature | Micrograd | PyTorch |
|---|---|---|
| Autograd | ✅ Scalar-level | ✅ Tensor-level |
| Dynamic graphs | ✅ | ✅ |
| Backpropagation | ✅ Manual | ✅ Optimized |
| Neural networks | ✅ Basic MLP | ✅ Full ecosystem |
| Optimization | ✅ SGD | ✅ Adam, RMSprop, etc. |
| Performance | 🐌 Educational | ⚡ Production-ready |
The notebook includes PyTorch comparisons to verify correctness:
import torch
# Micrograd computation
x = Value(2.0)
y = x ** 2
y.backward()
print(x.grad) # 4.0
# PyTorch equivalent
x_torch = torch.Tensor([2.0]).requires_grad_(True)
y_torch = x_torch ** 2
y_torch.backward()
print(x_torch.grad) # tensor([4.0])Training a simple 3-4-4-1 network on 4 examples:
- Initial loss: ~7.99
- Final loss: ~0.046 (after 60 epochs)
- Training time: <1 second for educational purposes
- Python 3.7+
- NumPy
- Matplotlib
- Graphviz (for visualization)
- Jupyter Notebook
# Clone the repository
git clone https://github.com/Jaloch-glitch/micrograd.git
cd micrograd
# Install dependencies
pip install numpy matplotlib graphviz jupyter
# For Graphviz visualization (system package)
# macOS:
brew install graphviz
# Ubuntu/Debian:
sudo apt-get install graphviz# Start Jupyter Notebook
jupyter notebook micrograd_from_scratch.ipynbRun all cells to:
- Understand automatic differentiation
- Build the Value class step-by-step
- Implement neural network components
- Train a binary classifier
- Visualize computation graphs
def backward(self):
# Build topological order
topo = []
visited = set()
def build_topo(v):
if v not in visited:
visited.add(v)
for child in v._prev:
build_topo(child)
topo.append(v)
build_topo(self)
# Apply chain rule in reverse topological order
self.grad = 1.0
for node in reversed(topo):
node._backward()def __mul__(self, other):
out = Value(self.data * other.data, (self, other), '*')
def _backward():
self.grad += other.data * out.grad # Chain rule
other.grad += self.data * out.grad
out._backward = _backward
return outPerfect for learning:
- Deep learning fundamentals without library abstractions
- Automatic differentiation implementation details
- Backpropagation algorithm step-by-step
- Neural network architecture from scratch
- Gradient descent optimization mechanics
- Computational graph concepts visually
- Operator overloading: Implementing
__add__,__mul__, etc. - Closure functions: Using
_backwardclosures for gradient computation - Topological sorting: Ensuring correct gradient flow order
- Graph algorithms: Building and traversing computation graphs
- Numerical differentiation: Verifying gradients with finite differences
- Scalar-only: No tensor/matrix operations (educational choice)
- No optimizers: Only basic gradient descent
- Limited activations: Only tanh and exp
- No GPU support: CPU-only implementation
- Performance: Much slower than production libraries
- Add tensor support for matrix operations
- Implement more activation functions (ReLU, sigmoid, softmax)
- Add optimizers (Adam, RMSprop)
- Support for convolutions
- Batch processing
- GPU acceleration
- More loss functions
- Regularization techniques
This project is inspired by Andrej Karpathy's micrograd, which teaches the fundamentals of neural networks by building an autograd engine from scratch.
While micrograd is educational, the concepts apply to:
- Deep learning frameworks (PyTorch, TensorFlow, JAX)
- Automatic differentiation libraries
- Scientific computing (physics simulations, optimization)
- Computational chemistry and biology
- Robotics and control systems
Felix Onyango
- GitHub: @Jaloch-glitch
- Location: Kenya, East Africa
- Specialization: ML Engineering, Neural Networks, Deep Learning
This project is open source and available for educational purposes.
- Inspired by Andrej Karpathy's micrograd tutorial
- Mathematical foundations from deep learning literature
- Graphviz for computational graph visualization
- PyTorch team for reference implementations
- Automatic Differentiation: How autograd engines work
- Backpropagation Algorithm: Original paper and modern explanations
- Computational Graphs: Theory and applications
- Neural Network Fundamentals: From perceptrons to deep learning
- Gradient-Based Optimization: SGD, momentum, and adaptive methods
Note: This is an educational project designed to teach neural network fundamentals. For production machine learning, use established frameworks like PyTorch, TensorFlow, or JAX.
# 1. Create a simple computation
a = Value(2.0, label='a')
b = Value(3.0, label='b')
c = a * b + Value(5.0)
c.label = 'c'
# 2. Compute gradients
c.backward()
# 3. Check results
print(f'c = {c.data}') # 11.0
print(f'dc/da = {a.grad}') # 3.0
print(f'dc/db = {b.grad}') # 2.0
# 4. Visualize
draw_dot(c)Start exploring neural networks from the ground up!