Deep learning library in Idris 2 with compile-time tensor shape checking and automatic differentiation.
# Type-check all library modules
idris2 --build idris-ml.ipkg
# Type-check a single module
idris2 --source-dir src -p contrib --check src/<File>.idr
# Build an example (examples aren't in the package, so manual flags are needed)
idris2 --source-dir src -p contrib -o <name> src/Example/<Name>.idr
# Run a built example
./build/exec/<name>Concrete examples:
idris2 --source-dir src -p contrib -o supervised src/Example/Supervised.idr && ./build/exec/supervised
idris2 --source-dir src -p contrib -o rnn src/Example/Rnn.idr && ./build/exec/rnn
idris2 --source-dir src -p contrib -o ntm src/Example/Ntm.idr && ./build/exec/ntm- Floating - Extended
Floatinginterface addingsqrt - Util - Helpers:
enumerate,permute,chunks - Tensor - Shape-indexed tensor:
Tensor : Vect rank Nat -> Type -> Type - Math - Loss functions, activations, linear algebra
- Memory - NTM read/write head operations
- Variable - Autograd node with computational graph
- DataPoint -
DataPointandRecurrentDataPointrecords - Endofunctor -
emap : (ty -> ty) -> e ty -> e tyfor type-preserving maps - Layer - Layer/Network types (mutually recursive), forward pass, constructors
- Backprop - Training loop:
zeroGrad,backward,step,epoch,train
-- Tensor.idr
data Tensor : Vect rank Nat -> Type -> Type where
STensor : ty -> Tensor [] ty
VTensor : Vect dim (Tensor dims ty) -> Tensor (dim :: dims) ty
Scalar = Tensor []
Vector elems = Tensor [elems]
Matrix rows columns = Tensor [rows, columns]
-- Variable.idr
record Variable where
constructor Var
paramId : Maybe String
value : Double
grad : Double
back : Double -> List Double
children : List Variable
-- Layer.idr (mutual block)
data Layer : (inputSize : Nat) -> (outputSize : Nat) -> Type -> Type where
LinearLayer : Matrix outputSize inputSize ty -> Vector outputSize ty -> Layer inputSize outputSize ty
RnnLayer : Matrix outputSize inputSize ty -> Matrix outputSize outputSize ty -> Vector outputSize ty -> Vector outputSize ty -> Layer inputSize outputSize ty
ActivationLayer : String -> ActivationFunction ty -> Layer n n ty
NormalizationLayer : String -> NormalizationFunction ty -> Layer n n ty
NtmLayer : Network (NtmInputWidth w) hs (NtmOutputWidth n w) ty -> Matrix n w ty -> ReadHead n ty -> WriteHead n ty -> Vector w ty -> Layer w w ty
data Network : (inputDims : Nat) -> (hiddenDims : List Nat) -> (outputDims : Nat) -> Type -> Type where
OutputLayer : Layer i o ty -> Network i [] o ty
(~>) : Layer i h ty -> Network h hs o ty -> Network i (h :: hs) o ty
-- Endofunctor.idr
interface Endofunctor e where
emap : (ty -> ty) -> e ty -> e ty-- Supervised: linear -> softmax
ll <- nameParams "ll" <$> linearLayer
let model = ll ~> OutputLayer softmaxLayer
-- NTM: controller network nested inside NtmLayer
controllerHidden <- linearLayer {i = NtmInputWidth W, o = H}
controllerOut <- linearLayer {i = H, o = NtmOutputWidth N W}
let controller = controllerHidden ~> sigmoidLayer ~> OutputLayer controllerOut
ntm <- ntmLayer {n = N, w = W} controller
let model = nameNetworkParams "ntm" $ ntm ~> OutputLayer softmaxLayer-- applyLayer and forward both return (updated, output) pairs
applyLayer : Layer i o ty -> Vector i ty -> (Layer i o ty, Vector o ty)
forward : Network i hs o ty -> Vector i ty -> (Network i hs o ty, Vector o ty)
-- RNN/NTM layers carry state; linear/activation layers return unchanged
let (updatedModel, output) = forward model input-- Backprop.idr: zeroGrad -> forward (via calculateLoss) -> backward -> step
epoch lr dataPoints lossFn model =
let zeroed = zeroGrad model -- 1. Clear gradients
loss = calculateLoss lossFn zeroed dataPoints -- 2. Forward pass + loss
propagated = backward loss zeroed -- 3. Backprop gradients
in step lr propagated -- 4. Gradient descent updateEvery learnable layer must be named before training:
ll <- nameParams "ll" <$> linearLayer -- names: ll_weight0, ll_bias0, ...
rnn <- nameParams "rnn" <$> rnnLayer -- names: rnn_inputWeight0, rnn_bias0, ...
nameNetworkParams "ntm" $ ntm ~> OutputLayer softmaxLayer -- recursive namingemap maps (ty -> ty) over Layer/Network without changing shape types. Used by zeroGrad, step, backward:
zeroGrad : Network i hs o Variable -> Network i hs o Variable
zeroGrad = emap {grad := 0}
step lr = emap (updateParam lr)Raw data is Double; training requires Variable. Convert with map fromDouble:
let prepared = map (map fromDouble) dataPoints -- DataPoint i o Double -> DataPoint i o Variable| Aspect | Supervised | Recurrent |
|---|---|---|
| Data type | DataPoint i o ty (x, y vectors) |
RecurrentDataPoint i o ty (xs, ys lists) |
| Forward | forward / forwardMany |
forwardRecurrent (folds over list) |
| Train | train / epoch |
trainRecurrent / epochRecurrent |
| State | Not carried between examples | Accumulated across timesteps |
| Loss fn | crossEntropy, meanSquaredError |
binaryCrossEntropyWithLogits, crossEntropy |
- Indentation: 2 spaces for
.idrfiles (see.editorconfig) - Naming: PascalCase for types/constructors, camelCase for functions/variables
- Imports: Idris stdlib first (
Data.Vect,System.Random), then internal modules alphabetically - Commits: Follow Conventional Commits —
feat:,fix:,refactor:,docs:,chore:, etc. Keep subject concise (~50 chars), imperative present tense. Commit work regularly in meaningful chunks — one logical change per commit. Never include ads, branding, or promotional text in commit messages or PR descriptions - Section dividers:
----------------------------------------------------------------------with section titles in Layer.idr style
- Build flags: Forgetting
--source-dir srcor-p contribproduces confusing import errors - Elementwise
(*):Tensor'sNuminstance uses elementwise multiply. For matrix-vector products, usematrixVectorMultiplyorvectorMatrixMultiplyfrom Math.idr paramIdrequirement: Variables without aparamId(i.e.,Nothing) are invisible togradMapand won't receive gradient updates. Always callnameParams/nameNetworkParamsbefore training- No test framework: No test suite exists. Verify changes by type-checking (
--check) and running examples updateParamcreates fresh Variables:updateParamin Backprop.idr constructs a newVarwith emptyback/childrento allow GC of the computation graph. This is intentional, not a bug- Mutual recursion in Layer.idr:
LayerandNetworkare mutually recursive (NtmLayer contains a Network).applyLayer,forward,nameParams,nameNetworkParams, andEndofunctorinstances all live inmutualblocks - NTM dimension calculations: The controller output width is
NtmOutputWidth n w = ReadHeadInputWidth n w + WriteHeadInputWidth n w + w. Getting these wrong causes type errors at network composition - NTM state flow:
readHeadOutputfrom the previous timestep concatenates with current input to form controller input (NtmInputWidth w = w + w). Memory, read head, and write head all update each step