My own little backpropagation library, built as a personal exercise. Also has a GPT-style decoder-only Transformer built with it as an example.
computation.pyfor computation graph implementation. This is where computation provenance is stored to make sure gradients are propagated during backprop.derivation.pyfor gradient computation code. This is where the partial derivatives for various functions are implemented.scratch.pyfor testing code that overfits a tiny MLP to a single example.gpt.pyfile that trains a GPT model.neural: implementation of higher-level neural net building blocks:attention.py: Multi-head attention implementationbase.py: Base utilities for making sure all these neural ops construct debuggable computation graphs.embedding.py: Basic embedding layer.initialization.py: Weight initialization algorithms. He and uniform random initialization supported.linear.py: Implementation of a fully connected linear layer.loss.py: Loss computations.Acurrently only mean squared loss is implemented.nets.py: Multi-layer perceptron (MLP) implementation.nonlinearity.py: Nonlinearities gfor neural nets. Only ReLU implemented for now.ops.py: One-off operations: softmax etc.optimizer.py: Optimizers. Only a simple optimizer is implemented.positional_encoding.py: Sin/cos-based positional encoding generation as in the original transformer.transformer.py: The actual transformer implementation.