feat: pytorch-style gradienttape with op-level backward recording #5996
Annotations
1 error
|
commitlint
You have commit messages with errors
⧗ input: feat: PyTorch-style GradientTape with op-level backward recording (#1057)
Core GradientTape<T> rewrite for automatic differentiation:
- AsyncLocal<T> tape stack for async pipeline safety (replaces ThreadStatic)
- RecordOp(name, inputs, output, backward) — Engine ops register backward
functions directly, no separate ExportComputationGraph needed
- Gradient(Tensor<T> loss) → Dictionary<Tensor<T>, Tensor<T>> — reverse-mode
AD via reverse topological traversal of recorded ops
- NoGradScope<T> — disables recording for inference (like torch.no_grad())
- Gradient accumulation for tensors used in multiple ops
- Persistent tapes for multiple gradient computations
- Backward-compatible: Watch(ComputationNode), Gradient(ComputationNode),
RecordOperation(ComputationNode), StopRecording/ResumeRecording still work
for existing callers during migration
11/11 tests pass:
- Add, Multiply, Chain rule, Gradient accumulation
- Persistent/non-persistent, NoGradScope, disposed safety
- AsyncLocal flows across await, unwatched tensor exclusion
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
✖ subject must not be sentence-case, start-case, pascal-case, upper-case [subject-case]
✖ found 1 problems, 0 warnings
ⓘ Get help: https://github.com/conventional-changelog/commitlint/#what-is-commitlint
⧗ input: feat: Conv2D, BatchNorm, LayerNorm, Softmax backward functions (#1057 Step 2 cont.)
New DifferentiableOps:
- Conv2D: backward via Engine.Conv2DBackwardInput/BackwardKernel
- LayerNorm (Ba et al. 2016): full backward through mean/variance/gamma/beta
- BatchNorm (Ioffe & Szegedy 2015): full backward through batch statistics
- Softmax: Jacobian-vector product dx_i = s_i*(dout_i - dot(dout,s))
All verified against finite-difference gradient checking (rel tol < 1e-4):
- Conv2D, LayerNorm (weighted), BatchNorm (weighted), Softmax (weighted)
- Chain: Conv→ReLU, LayerNorm→Tanh, tanh(x@w+b), MSE loss
35/35 tests pass (11 tape + 24 gradient checks).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
✖ subject must not be sentence-case, start-case, pascal-case, upper-case [subject-case]
✖ found 1 problems, 0 warnings
ⓘ Get help: https://github.com/conventional-changelog/commitlint/#what-is-commitlint
⧗ input: feat: GPU path for named-port multi-input layers (#1058 Step 4)
LayerBase<T>:
- ForwardGpu(IReadOnlyDictionary<string, IGpuTensor<T>>) — default
delegates to ForwardGpu(input) using "input" key
- BackwardGpuMulti(IReadOnlyDictionary<string, IGpuTensor<T>>) — default
delegates to BackwardGpu(gradient)
NeuralNetworkBase<T>:
- ForwardGpu(input, auxiliaryInputs) — routes named GPU tensors to
multi-port layers during GPU-resident forward pass, mirroring the
CPU ForwardWithMemory(input, auxiliaryInputs)
58/58 tests pass.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
✖ subject must not be sentence-case, start-case, pascal-case, upper-case [subject-case]
✖ found 1 problems, 0 warnings
ⓘ Get help: https://github.com/conventional-changelog/commitlint/#what-is-commitlint
⧗ input: feat: Concatenate/Split ops + perf optimizations from audit
New ops:
- Concatenate(tensors, axis): forward concat, backward splits gradient
- Split(x, splitSizes, axis): forward splits, backward concatenates
Both essential for U-Net skip connections and multi-head attention
Performance optimizations:
- ReLU mask: byte[] (1B/element) instead of Tensor<T> (4-8B/element)
- InputPorts/OutputPorts: cached with ??= to avoid allocation per access
All 66 gradient check tests pass.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
✖ subject must not be sentence-case, start-case, pascal-case, upper-case [subject-case]
✖ found 1 problems, 0 warnings
ⓘ Get help: https://github.com/conventional-changelog/commitlint/#what-is-commitlint
⧗ input: refactor: migrate DeepBeliefNetwork, FastText, GloVe, Word2Vec, NeuralNetwork,
⚠ body must have leading b
|