karpathy · jli05 · Apr 10, 2025 · Apr 10, 2025 · Apr 8, 2025 · Apr 14, 2025
diff --git a/.gitignore b/.gitignore
@@ -1 +1,4 @@
 .ipynb_checkpoints/
+__pycache__/
+build/
+*.egg-info/
diff --git a/README.md b/README.md
@@ -1,49 +1,166 @@
 
 # micrograd
+A tiny autograd engine whose only dependency is NumPy the linear algebra library. Micrograd implements backpropagation (automatic differentiation) over a graph of mathemtical operations.
 
-![awww](puppy.jpg)
+* 20 kilobytes [core code](micrograd/engine.py), 10,000+ times smaller
+* as portable as Python and NumPy
+* comparable performance as industrial contenders
+* code can be timed with Python's native profiler
 
-A tiny Autograd engine (with a bite! :)). Implements backpropagation (reverse-mode autodiff) over a dynamically built DAG and a small neural networks library on top of it with a PyTorch-like API. Both are tiny, with about 100 and 50 lines of code respectively. The DAG only operates over scalar values, so e.g. we chop up each neuron into all of its individual tiny adds and multiplies. However, this is enough to build up entire deep neural nets doing binary classification, as the demo notebook shows. Potentially useful for educational purposes.
+This version works with vectors, including matrices (2-dimensional), or higher-dimensional tensors. For @karpathy's original scalar-based version, switch to the code with tag `scalar`.
 
-### Installation
+TensorFlow, Apple's MLX and our micrograd, [https://www.brief-ds.com/2025/09/25/tensorflow-mlx.html](https://www.brief-ds.com/2025/09/25/tensorflow-mlx.html)
 
-```bash
-pip install micrograd
+## Get Started
+In any working directory, create a virtual environment,
+
+```sh
+python3 -m venv venv
+. venv/bin/activate
+cd <directory_of_micrograd>     # if not already in the micrograd's directory
+pip3 install .
+cd <initial_working_directory>  # if different from micrograd
+pip3 install jupyter            # for running demos in demos/
+pip3 install torch              # to run tests/test_vs_torch.py
+```
+
+Below is a Python snippet. `c` is the matrix-vector product of `a` and `b`. After calling `c.backward()`, the mathematical derivatives of `c` with respect to any variable it depends on are evaluated, e.g `a.grad` is `dc/da`, `b.grad` is `dc/db`. `c.grad` is always all one as `dc/dc=1`.
+
+```python
+from micrograd import Value
+from numpy import array
+
+a = Value(array([[2, 3], [5, 4]]))
+b = Value(array([1, -1]))
+c = (a @ b).relu()
+print(c)      # Value(data=[0 1], grad=None)
+c.backward()
+print(c)      # Value(data=[0 1], grad=[1. 1.])
+print(a)      # Value(data=..., grad=[[0. 0.], [1. -1.]])
+print(b)      # Value(data=..., grad=[5. 4.])
+```
+
+PyTorch can only mathematically derive an expression that produces a scalar value. micrograd relaxes it: if the expression produces an array, the sum of the array will be derived.
+
+For full examples, go to [`demos/`](demos). The scalar-version [demos/demo_scalar.ipynb](demos/demo_scalar.ipynb) takes minutes to run, but the vector-version training [demos/demo_vector.ipynb](demos/demo_vector.ipynb) is instant.
+
+## Lazy evaluation
+When defining a tensor, one may just indicate `shape` and `name`, and later on provide the value corresponding to the `name`.
+
+```python
+from micrograd import Value
+from numpy import array
+
+a = Value(shape=(2, 2), name='var1')
+b = Value(shape=(2,), name='var2')
+c = (a @ b).relu()
+c.forward(var1=array([[2, 3], [5, 4]]),
+          var2=array([1, -1]))
+c.backward()
+```
+
+By default, a variable awaiting value takes `nan`, if not fed any value in `forward()`. The final result will be `nan`, signalling missing values somewhere. If a mathematical expression contains no variable awaiting value, the `forward()` call is not necessary. Once defined, its value will be stored in `.data`.
+
+## Data type
+As one example, with `f=ab`, `df/da=b`. `a.grad` would inherit the data type of `b`. For this inter-dependence, we design a uniform `DTYPE` for one program, to be passed from the environment. By default `DTYPE=float64`, identical as the Python float type. For example,
+
+```sh
+DTYPE=float32 python3 <program_using_micrograd>
+```
+
+micrograd's `__init__.py` reads `DTYPE` from the environment. In Python, _before_ importing micrograd, one may manipulate the `DTYPE` by
+
+```python
+from os import environ
+environ['DTYPE'] = ...
+
+from micrograd import Value
+```
+
+One may get the `DTYPE` that micrograd read,
+
+```python
+from micrograd import DTYPE
 ```
 
-### Example usage
+## Efficient dependency graph computation
+The dependency graph of mathematical operations in a mathematical expression is calculated only once then cached, **assuming** this expression is *static*, although the values of its variables may change.
+
+## Back propogation (automatic differentiation)
+If a mathematical expression `x` contains variables awaiting value, call `forward()` once to evaluate it.
+
+Call `backward()` for mathematical differentiation of `x` with respect to the dependent variables. The `backward()` manages all initialisations of gradients: unlike PyTorch, no `zero_grad()` is necessary before `backward()`.
+
+```python
+x.forward(var1=value1, var2=value2, ...)
+x.backward()
+```
+
+## Supported operators
+* `__pow__`
+* `__matmul__`
+* `tensordot` for tensor contraction: unlike numpy tensordot, the last axis (indexed by -1) of the left tensor contracts with the first axis of the right tensor; the next to last axis (indexed by -2) of the left tensor with the 2nd axis of the right tensor; so on and so forth.
+* `relu`
+* `exp`
+* `log`
+* `log1p`
+* `tanh`
+* `arctanh`
+* `T` for transpose
+* `sum`
+* `mean`
+
+## Optimise by Stochastic Gradient Descent
+We can minimise a mathematical expression by moving the values of its dependent variables. For example, if `x` is defined from `a` and `b`,
+
+```python
+# call x.forward() if necessary
+x.backward()
+a -= learning_rate * a.grad
+b -= learning_rate * b.grad
+```
 
-Below is a slightly contrived example showing a number of possible supported operations:
+The [`micrograd.optim.SGD`](micrograd/optim.py) wraps up the above
 
 ```python
-from micrograd.engine import Value
-
-a = Value(-4.0)
-b = Value(2.0)
-c = a + b
-d = a * b + b**3
-c += c + 1
-c += 1 + c + (-a)
-d += d * 2 + (b + a).relu()
-d += 3 * d + (b - a).relu()
-e = c - d
-f = e**2
-g = f / 2.0
-g += 10.0 / f
-print(f'{g.data:.4f}') # prints 24.7041, the outcome of this forward pass
-g.backward()
-print(f'{a.grad:.4f}') # prints 138.8338, i.e. the numerical value of dg/da
-print(f'{b.grad:.4f}') # prints 645.5773, i.e. the numerical value of dg/db
+SGD(wrt=[],   # list of variables with respect to which
+              # to perform minimisation
+    learning_rate=None,
+              # a non-negative number or a generator of them
+    momentum=None)
 ```
 
-### Training a neural net
+The `learning_rate` can accept a generator implementing a schedule of varying learning rates. Typical usage is as below,
 
-The notebook `demo.ipynb` provides a full demo of training an 2-layer neural network (MLP) binary classifier. This is achieved by initializing a neural net from `micrograd.nn` module, implementing a simple svm "max-margin" binary classification loss and using SGD for optimization. As shown in the notebook, using a 2-layer neural net with two 16-node hidden layers we achieve the following decision boundary on the moon dataset:
+```python
+optimiser = SGD(...)
+
+for k in range(n_steps):
+
+    # batch_iterator yields a dict
+    # for the minibatch, e.g.
+    #
+    #  batch_data = {'X': ..,
+    #                'y': ..}
+    #
+    batch_data = next(batch_iterator)
 
-![2d neuron](moon_mlp.png)
+    loss.forward(**batch_data)
+    loss.backward()
 
-### Tracing / visualization
+    optimiser.step()
+
+    # validation
+    validation_metric.forward()
+
+```
 
+## The Demos
+The notebooks under `demos/` provide a full demo of training an 2-layer neural network (MLP) binary classifier. This is achieved by initializing a neural net from `micrograd.nn` module, implementing a simple svm "max-margin" binary classification loss and using SGD for optimization. As shown in the notebook, using a 2-layer neural net with two 16-node hidden layers we achieve the following decision boundary on the moon dataset:
+
+![2d neuron](assets/moon_mlp.png)
+
+## Tracing / visualization
 For added convenience, the notebook `trace_graph.ipynb` produces graphviz visualizations. E.g. this one below is of a simple 2D neuron, arrived at by calling `draw_dot` on the code below, and it shows both the data (left number in each node) and the gradient (right number in each node).
 
 ```python
@@ -54,16 +171,23 @@ y = n(x)
 dot = draw_dot(y)
 ```
 
-![2d neuron](gout.svg)
-
-### Running tests
+![2d neuron](assets/gout.svg)
 
-To run the unit tests you will have to install [PyTorch](https://pytorch.org/), which the tests use as a reference for verifying the correctness of the calculated gradients. Then simply:
+## Running tests
+If PyTorch requires NumPy lower than version 2, create a new virtual environment `torch`, and install downgraded NumPy there for the tests.
 
-```bash
-python -m pytest
+```sh
+python3 -m venv torch
+. torch/bin/activate
+pip3 install "numpy<2"    # put numpy<2 inside quotation marks
+                          # quotation marks here are important
 ```
 
-### License
+Run the unit tests:
+
+```sh
+python -m unittest tests/*.py
+```
 
+## License
 MIT
diff --git a/gout.svg → assets/gout.svg b/gout.svg → assets/gout.svg
diff --git a/moon_mlp.png → assets/moon_mlp.png b/moon_mlp.png → assets/moon_mlp.png
diff --git a/puppy.jpg → assets/puppy.jpg b/puppy.jpg → assets/puppy.jpg
diff --git a/demo.ipynb b/demo.ipynb
diff --git a/demos/demo_scalar.ipynb b/demos/demo_scalar.ipynb
diff --git a/demos/demo_vector.ipynb b/demos/demo_vector.ipynb
diff --git a/trace_graph.ipynb → demos/trace_graph.ipynb b/trace_graph.ipynb → demos/trace_graph.ipynb
@@ -103,7 +103,7 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "Python 3",
+   "display_name": "Python 3 (ipykernel)",
    "language": "python",
    "name": "python3"
   },
@@ -117,9 +117,9 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.6.5"
+   "version": "3.10.16"
   }
  },
  "nbformat": 4,
- "nbformat_minor": 2
+ "nbformat_minor": 4
 }
diff --git a/micrograd/__init__.py b/micrograd/__init__.py
@@ -0,0 +1,6 @@
+
+from os import environ
+DTYPE = environ.get('DTYPE', 'float64')
+assert DTYPE in ('float16', 'float32', 'float64', 'float128')
+
+from .engine import Value, tensordot