A comprehensive implementation of neural networks built from the ground up in Python, featuring both numpy-based forward propagation and a micrograd-style automatic differentiation engine for backpropagation.
This project was built while learning from two exceptional educators in machine learning:
- micrograd - A tiny scalar-valued autograd engine
- Neural Networks: Zero to Hero - YouTube series on building neural networks from scratch
- Inspired the
micrograd.pyimplementation and backpropagation architecture
- Neural Networks from Scratch in Python - Book and tutorial series
- YouTube Channel - Comprehensive Python and ML tutorials
- Inspired the
layer_dense.pyand forward pass implementation using numpy
neural-network-v1/
βββ micrograd.py # Scalar autograd engine (backprop)
βββ micrograd_nn.py # Neural network using micrograd
βββ layer_dense.py # Dense layers and activations (numpy)
βββ network.py # Neural network container (forward pass)
βββ test1.ipynb # Jupyter notebook experiments
βββ micro_grad.ipynb # Micrograd visualization notebook
βββ README.md # This file
A lightweight automatic differentiation framework inspired by Andrej Karpathy's micrograd:
- Scalar-based computation graph with automatic gradient tracking
- Backpropagation through arbitrary computational graphs
- Supported operations:
- Addition, subtraction, multiplication, division
- Power functions
- Tanh activation
- Automatic gradient accumulation via chain rule
Example:
from micrograd import value
a = value(2.0, label='a')
b = value(-3.0, label='b')
c = value(10.0, label='c')
e = a * b; e.label = 'e'
d = e + c; d.label = 'd'
f = value(-2.0, label='f')
L = d * f; L.label = 'L'
# Compute all gradients
L.backward()
print(f"dL/da = {a.grad}") # How much does 'a' affect the loss?Efficient vectorized implementations for forward propagation:
Layers:
layer_dense- Fully connected layer with weights and biasesAc_relu- ReLU activation (max(0, x))Ac_softmax- Softmax activation for classificationAc_tanh- Tanh activationloss_CategoricalCrossEntropy- Cross-entropy loss for classification
Example:
import layer_dense as ld
import numpy as np
# Create a dense layer: 2 inputs -> 5 neurons
layer1 = ld.layer_dense(2, 5)
activation1 = ld.Ac_relu()
# Forward pass
X = np.random.randn(100, 2) # 100 samples, 2 features
output = layer1.forward_pass(X)
activated = activation1.forward_pass(output)Sequential model container for building multi-layer networks:
Example:
from network import NeuralNetwork
import layer_dense as ld
nn = NeuralNetwork()
nn.add(ld.layer_dense(2, 64)) # Input: 2 features -> 64 neurons
nn.add(ld.Ac_relu()) # ReLU activation
nn.add(ld.layer_dense(64, 64)) # Hidden layer
nn.add(ld.Ac_relu())
nn.add(ld.layer_dense(64, 3)) # Output: 3 classes
nn.add(ld.Ac_softmax()) # Softmax for probabilities
# Forward pass
predictions = nn.forward(X)Neural network implementation using the micrograd autograd engine:
Classes:
DenseMG- Dense layer with scalar Value objects for each weight/biasReLUMG- ReLU activation with automatic differentiationNeuralNetworkMG- Network container with backprop support
Features:
- Automatic gradient computation through backpropagation
- SGD optimizer with learning rate
- MSE loss function
Training Example:
from micrograd_nn import NeuralNetworkMG, DenseMG, ReLUMG, mse_loss
import numpy as np
# Create network
nn = NeuralNetworkMG()
nn.add(DenseMG(2, 8), ReLUMG())
nn.add(DenseMG(8, 3), None)
# Training loop (per-sample)
for epoch in range(100):
for i in range(len(X)):
# Forward pass
outputs = nn.forward_sample(X[i])
loss = mse_loss(outputs, Y[i]) # Y is one-hot encoded
# Backward pass
nn.zero_grads()
loss.backward()
# Update weights
nn.step_sgd(lr=0.01)Interactive experiments with:
- Single neuron forward pass calculations
- Layer-wise computations
- Spiral dataset visualization using matplotlib
- One-hot encoding with numpy
- Categorical cross-entropy loss
- Full network training pipeline
Visualization and experimentation with micrograd:
- Computational graph construction
- Graphviz visualization of operations
- Manual gradient calculations
- Backward propagation walkthrough
pip install numpy matplotlib graphviz nnfsInstall Graphviz system package:
- Windows: Download from graphviz.org
- Linux:
sudo apt-get install graphviz - macOS:
brew install graphviz
- Matrix multiplication for efficient batch processing
- Activation functions (ReLU, Softmax, Tanh)
- Layer stacking and composition
- Chain rule for gradient computation
- Topological sorting of computation graph
- Gradient accumulation for nodes used multiple times
- Dynamic computation graph construction
- Automatic gradient tracking through operations
- Lazy gradient computation (only when
.backward()is called)
Input Layer (features)
β
Dense Layer + ReLU
β
Dense Layer + ReLU
β
Dense Layer + Softmax
β
Output (class probabilities)
import nnfs
from nnfs.datasets import spiral_data
from network import NeuralNetwork
import layer_dense as ld
# Generate spiral dataset
nnfs.init()
X, y = spiral_data(100, 3) # 100 samples per class, 3 classes
# Build network
nn = NeuralNetwork()
nn.add(ld.layer_dense(2, 64))
nn.add(ld.Ac_relu())
nn.add(ld.layer_dense(64, 3))
nn.add(ld.Ac_softmax())
# Forward pass
predictions = nn.forward(X)
# Calculate loss
loss_fn = ld.loss_CategoricalCrossEntropy()
loss = loss_fn.calculate_loss(predictions, y, num_of_classes=3)
print(f"Loss: {loss}")- Built from scratch - No high-level frameworks like TensorFlow or PyTorch
- Educational focus - Clear, commented code showing how everything works
- Dual implementation:
- NumPy for efficient forward propagation
- Micrograd for understanding backpropagation
- Visualization - Graphviz integration to see computation graphs
- Interactive - Jupyter notebooks for experimentation
By building this project, you learn:
- β How matrix operations power neural networks
- β The math behind backpropagation and chain rule
- β How automatic differentiation engines work
- β Building computational graphs dynamically
- β Weight initialization and gradient descent
- β Activation functions and their purposes
- β Loss functions for classification
- β One-hot encoding and categorical data
- β Vectorization for performance
- Add more activation functions (LeakyReLU, ELU, Swish)
- Implement batch normalization
- Add optimizers (Adam, RMSprop, Momentum)
- Convolutional layers for image processing
- Dropout for regularization
- Learning rate scheduling
- Mini-batch gradient descent
- Model serialization (save/load weights)
- GPU acceleration with CuPy
- Advanced loss functions (Focal loss, etc.)
Huge thanks to:
- Andrej Karpathy for making neural networks approachable and inspiring the autograd implementation
- Sentdex (Harrison Kinsley) for clear explanations of neural network fundamentals and the numpy-based approach
Their teaching made this project possible! π
This project is for educational purposes. Feel free to use and modify for learning!
- Start with micrograd.py - Understand how autograd works at the scalar level
- Visualize the graphs - Use the graphviz functions to see how operations connect
- Run the notebooks - Interactive experimentation is key to understanding
- Modify and break things - Change hyperparameters, architectures, and see what happens
- Compare implementations - See how micrograd_nn.py differs from layer_dense.py
- Read the comments - The code is heavily commented to explain the "why"
Happy Learning! ππ§
"The best way to understand deep learning is to build it yourself."