Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
88477eb
move images to assets/
Apr 10, 2025
a1a4fac
move demo notebooks to demos/
Apr 10, 2025
2634276
add tests for unittest
Apr 8, 2025
f4c6dff
replace pytorch test with test against known results
Apr 14, 2025
ac50875
add forward() computation
Apr 14, 2025
1dfa9cf
extend autodiff to matrices
Apr 14, 2025
8f0a7fd
add log1p operator
Apr 15, 2025
ea2ca4c
initialise variables' data with nan
Apr 15, 2025
91e14da
add transpose operator
Apr 15, 2025
662f403
add arctanh operator
Apr 16, 2025
3033ee3
add sum operator
Apr 16, 2025
81bcbd7
use nan when variable not in input
Apr 18, 2025
a463847
add mean op
Apr 18, 2025
7f421f9
add tensordot and __matmul__ operators
Apr 18, 2025
fa66f3b
add log operator
Apr 21, 2025
04b0d91
add dependency to setup.py
Apr 18, 2025
125bf8d
update README
Apr 18, 2025
f39aed2
add tanh operator
Apr 21, 2025
b6f8bdd
add test for unary ops against torch
Apr 21, 2025
d79d4dd
add test for reduce ops
Apr 21, 2025
f6ba46c
move tensordot outside the Value class to become a function
Apr 21, 2025
ba79e1b
add test on tensordot
Apr 21, 2025
0614e14
add the benefits
Apr 24, 2025
8ac119c
add section "Essential Use Pattern"
Apr 27, 2025
5e8078b
rephrase "operator depedency topology" section
Apr 27, 2025
2bc356a
remove unnecessary zero_grad()
Apr 28, 2025
d0a9b8a
fix cases of broadcast shape in addition and multiplication
Apr 29, 2025
252b289
add SGD class
Apr 29, 2025
b45463a
scalar and vector-version demos produce exactly the same results
Apr 29, 2025
f3be240
add "Stochastic Gradient Descent" section in README.md
Apr 29, 2025
070330f
shape must be tuple
May 5, 2025
a35faa5
add pyproject.toml for pip forward compatibility
May 5, 2025
1c57755
fix reduce ops with negative axis
May 5, 2025
7eee679
fix for scalar input
May 8, 2025
364d349
add installation instructions
May 9, 2025
fcf0147
add arcsin op
May 14, 2025
c52ad5c
use where() for relu op grad
May 27, 2025
5b78fd0
reuse space for grad
May 27, 2025
18ad70f
add dtype
Sep 14, 2025
e6fcefd
add print-out of numerical error
Sep 14, 2025
543f1bd
use broadcast_to
Sep 16, 2025
8f574b8
add link to full examples
Sep 24, 2025
30671e6
shift position of dtype
Sep 24, 2025
9e60b60
rephrase
Sep 24, 2025
d6e8393
re-arrange sections
Sep 24, 2025
0c6f563
rephrase
Sep 24, 2025
416a8f1
rephrase
Sep 24, 2025
a9fbcd2
add link to core code
Sep 25, 2025
f8f6470
add link to SGD class
Sep 25, 2025
9096124
add blog post
Oct 1, 2025
d1b3774
rewrite import
Mar 10, 2026
4cb7aab
add exp op
Mar 10, 2026
6d900fe
change SGD.step() signature
Mar 14, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1 +1,4 @@
.ipynb_checkpoints/
__pycache__/
build/
*.egg-info/
196 changes: 160 additions & 36 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,49 +1,166 @@

# micrograd
A tiny autograd engine whose only dependency is NumPy the linear algebra library. Micrograd implements backpropagation (automatic differentiation) over a graph of mathemtical operations.

![awww](puppy.jpg)
* 20 kilobytes [core code](micrograd/engine.py), 10,000+ times smaller
* as portable as Python and NumPy
* comparable performance as industrial contenders
* code can be timed with Python's native profiler

A tiny Autograd engine (with a bite! :)). Implements backpropagation (reverse-mode autodiff) over a dynamically built DAG and a small neural networks library on top of it with a PyTorch-like API. Both are tiny, with about 100 and 50 lines of code respectively. The DAG only operates over scalar values, so e.g. we chop up each neuron into all of its individual tiny adds and multiplies. However, this is enough to build up entire deep neural nets doing binary classification, as the demo notebook shows. Potentially useful for educational purposes.
This version works with vectors, including matrices (2-dimensional), or higher-dimensional tensors. For @karpathy's original scalar-based version, switch to the code with tag `scalar`.

### Installation
TensorFlow, Apple's MLX and our micrograd, [https://www.brief-ds.com/2025/09/25/tensorflow-mlx.html](https://www.brief-ds.com/2025/09/25/tensorflow-mlx.html)

```bash
pip install micrograd
## Get Started
In any working directory, create a virtual environment,

```sh
python3 -m venv venv
. venv/bin/activate
cd <directory_of_micrograd> # if not already in the micrograd's directory
pip3 install .
cd <initial_working_directory> # if different from micrograd
pip3 install jupyter # for running demos in demos/
pip3 install torch # to run tests/test_vs_torch.py
```

Below is a Python snippet. `c` is the matrix-vector product of `a` and `b`. After calling `c.backward()`, the mathematical derivatives of `c` with respect to any variable it depends on are evaluated, e.g `a.grad` is `dc/da`, `b.grad` is `dc/db`. `c.grad` is always all one as `dc/dc=1`.

```python
from micrograd import Value
from numpy import array

a = Value(array([[2, 3], [5, 4]]))
b = Value(array([1, -1]))
c = (a @ b).relu()
print(c) # Value(data=[0 1], grad=None)
c.backward()
print(c) # Value(data=[0 1], grad=[1. 1.])
print(a) # Value(data=..., grad=[[0. 0.], [1. -1.]])
print(b) # Value(data=..., grad=[5. 4.])
```

PyTorch can only mathematically derive an expression that produces a scalar value. micrograd relaxes it: if the expression produces an array, the sum of the array will be derived.

For full examples, go to [`demos/`](demos). The scalar-version [demos/demo_scalar.ipynb](demos/demo_scalar.ipynb) takes minutes to run, but the vector-version training [demos/demo_vector.ipynb](demos/demo_vector.ipynb) is instant.

## Lazy evaluation
When defining a tensor, one may just indicate `shape` and `name`, and later on provide the value corresponding to the `name`.

```python
from micrograd import Value
from numpy import array

a = Value(shape=(2, 2), name='var1')
b = Value(shape=(2,), name='var2')
c = (a @ b).relu()
c.forward(var1=array([[2, 3], [5, 4]]),
var2=array([1, -1]))
c.backward()
```

By default, a variable awaiting value takes `nan`, if not fed any value in `forward()`. The final result will be `nan`, signalling missing values somewhere. If a mathematical expression contains no variable awaiting value, the `forward()` call is not necessary. Once defined, its value will be stored in `.data`.

## Data type
As one example, with `f=ab`, `df/da=b`. `a.grad` would inherit the data type of `b`. For this inter-dependence, we design a uniform `DTYPE` for one program, to be passed from the environment. By default `DTYPE=float64`, identical as the Python float type. For example,

```sh
DTYPE=float32 python3 <program_using_micrograd>
```

micrograd's `__init__.py` reads `DTYPE` from the environment. In Python, _before_ importing micrograd, one may manipulate the `DTYPE` by

```python
from os import environ
environ['DTYPE'] = ...

from micrograd import Value
```

One may get the `DTYPE` that micrograd read,

```python
from micrograd import DTYPE
```

### Example usage
## Efficient dependency graph computation
The dependency graph of mathematical operations in a mathematical expression is calculated only once then cached, **assuming** this expression is *static*, although the values of its variables may change.

## Back propogation (automatic differentiation)
If a mathematical expression `x` contains variables awaiting value, call `forward()` once to evaluate it.

Call `backward()` for mathematical differentiation of `x` with respect to the dependent variables. The `backward()` manages all initialisations of gradients: unlike PyTorch, no `zero_grad()` is necessary before `backward()`.

```python
x.forward(var1=value1, var2=value2, ...)
x.backward()
```

## Supported operators
* `__pow__`
* `__matmul__`
* `tensordot` for tensor contraction: unlike numpy tensordot, the last axis (indexed by -1) of the left tensor contracts with the first axis of the right tensor; the next to last axis (indexed by -2) of the left tensor with the 2nd axis of the right tensor; so on and so forth.
* `relu`
* `exp`
* `log`
* `log1p`
* `tanh`
* `arctanh`
* `T` for transpose
* `sum`
* `mean`

## Optimise by Stochastic Gradient Descent
We can minimise a mathematical expression by moving the values of its dependent variables. For example, if `x` is defined from `a` and `b`,

```python
# call x.forward() if necessary
x.backward()
a -= learning_rate * a.grad
b -= learning_rate * b.grad
```

Below is a slightly contrived example showing a number of possible supported operations:
The [`micrograd.optim.SGD`](micrograd/optim.py) wraps up the above

```python
from micrograd.engine import Value

a = Value(-4.0)
b = Value(2.0)
c = a + b
d = a * b + b**3
c += c + 1
c += 1 + c + (-a)
d += d * 2 + (b + a).relu()
d += 3 * d + (b - a).relu()
e = c - d
f = e**2
g = f / 2.0
g += 10.0 / f
print(f'{g.data:.4f}') # prints 24.7041, the outcome of this forward pass
g.backward()
print(f'{a.grad:.4f}') # prints 138.8338, i.e. the numerical value of dg/da
print(f'{b.grad:.4f}') # prints 645.5773, i.e. the numerical value of dg/db
SGD(wrt=[], # list of variables with respect to which
# to perform minimisation
learning_rate=None,
# a non-negative number or a generator of them
momentum=None)
```

### Training a neural net
The `learning_rate` can accept a generator implementing a schedule of varying learning rates. Typical usage is as below,

The notebook `demo.ipynb` provides a full demo of training an 2-layer neural network (MLP) binary classifier. This is achieved by initializing a neural net from `micrograd.nn` module, implementing a simple svm "max-margin" binary classification loss and using SGD for optimization. As shown in the notebook, using a 2-layer neural net with two 16-node hidden layers we achieve the following decision boundary on the moon dataset:
```python
optimiser = SGD(...)

for k in range(n_steps):

# batch_iterator yields a dict
# for the minibatch, e.g.
#
# batch_data = {'X': ..,
# 'y': ..}
#
batch_data = next(batch_iterator)

![2d neuron](moon_mlp.png)
loss.forward(**batch_data)
loss.backward()

### Tracing / visualization
optimiser.step()

# validation
validation_metric.forward()

```

## The Demos
The notebooks under `demos/` provide a full demo of training an 2-layer neural network (MLP) binary classifier. This is achieved by initializing a neural net from `micrograd.nn` module, implementing a simple svm "max-margin" binary classification loss and using SGD for optimization. As shown in the notebook, using a 2-layer neural net with two 16-node hidden layers we achieve the following decision boundary on the moon dataset:

![2d neuron](assets/moon_mlp.png)

## Tracing / visualization
For added convenience, the notebook `trace_graph.ipynb` produces graphviz visualizations. E.g. this one below is of a simple 2D neuron, arrived at by calling `draw_dot` on the code below, and it shows both the data (left number in each node) and the gradient (right number in each node).

```python
Expand All @@ -54,16 +171,23 @@ y = n(x)
dot = draw_dot(y)
```

![2d neuron](gout.svg)

### Running tests
![2d neuron](assets/gout.svg)

To run the unit tests you will have to install [PyTorch](https://pytorch.org/), which the tests use as a reference for verifying the correctness of the calculated gradients. Then simply:
## Running tests
If PyTorch requires NumPy lower than version 2, create a new virtual environment `torch`, and install downgraded NumPy there for the tests.

```bash
python -m pytest
```sh
python3 -m venv torch
. torch/bin/activate
pip3 install "numpy<2" # put numpy<2 inside quotation marks
# quotation marks here are important
```

### License
Run the unit tests:

```sh
python -m unittest tests/*.py
```

## License
MIT
File renamed without changes
File renamed without changes
File renamed without changes
353 changes: 0 additions & 353 deletions demo.ipynb

This file was deleted.

364 changes: 364 additions & 0 deletions demos/demo_scalar.ipynb

Large diffs are not rendered by default.

374 changes: 374 additions & 0 deletions demos/demo_vector.ipynb

Large diffs are not rendered by default.

6 changes: 3 additions & 3 deletions trace_graph.ipynb → demos/trace_graph.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
Expand All @@ -117,9 +117,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
"version": "3.10.16"
}
},
"nbformat": 4,
"nbformat_minor": 2
"nbformat_minor": 4
}
6 changes: 6 additions & 0 deletions micrograd/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@

from os import environ
DTYPE = environ.get('DTYPE', 'float64')
assert DTYPE in ('float16', 'float32', 'float64', 'float128')

from .engine import Value, tensordot
Loading