Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 38 additions & 2 deletions source/standard-modules/neural/bindless-storage.slang
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,8 @@ public struct BindlessAddress<T> : IPointerLikeAddress<T>
}
}

[Differentiable]
[ForwardDerivative(fwd_getOffset)]
public This getOffset(int elements)
Comment on lines +80 to 82
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

In current Slang autodiff documentation, can a method marked [Differentiable]legally return a type that implementsIDifferentiablePtrType, or are such types restricted to input parameters only?

💡 Result:

In Slang’s current autodiff docs, a function/method marked [Differentiable] may not return a type that implements IDifferentiablePtrType. Types implementing IDifferentiablePtrType are restricted to input (in) parameters only (i.e., they cannot be used as out, inout, or return types), and attempting to do so is a compile-time error. [1]


getOffset() methods cannot be marked [Differentiable] because they return IDifferentiablePtrType.

In Slang's autodiff system, types implementing IDifferentiablePtrType are restricted to input parameters only and cannot be returned from [Differentiable] methods—this is a compile-time error per the documented rules. The current annotations on these methods violate that restriction.

Also applies to: 141–143, 250–252, 330–332

{
uint newBaseIndex = baseIndex + elements;
Expand All @@ -86,6 +88,13 @@ public struct BindlessAddress<T> : IPointerLikeAddress<T>
return address;
}

static DifferentialPtrPair<This> fwd_getOffset(DifferentialPtrPair<This> self, int elements)
{
return DifferentialPtrPair<This>(
self.p.getOffset(elements),
self.d.getOffset(elements));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor style inconsistency: the Ptr extension's fwd_getOffset (line 339) uses self.p + elements / self.d + elements (raw pointer arithmetic), while the other three implementations use self.p.getOffset(elements) / self.d.getOffset(elements). Both are functionally correct for Ptr since getOffset is just This(this + elements), but using self.p.getOffset(elements) everywhere would be more consistent and would remain correct if the internal implementation of getOffset ever changes.

Nit, not blocking.

}

[ForceInline]
internal uint4 readUint4<DstType, bool IsAligned, uint ActualBoundary>(int offsetIndex)
where DstType : __BuiltinFloatingPointType
Expand Down Expand Up @@ -129,9 +138,18 @@ public struct PointerAddress<T> : IPointerLikeAddress<T>
set { ptr[index] = newValue; }
}

[Differentiable]
[ForwardDerivative(fwd_getOffset)]
public This getOffset(int elements)
{
return This(ptr + elements);
return no_diff(This(ptr + elements));
}

static DifferentialPtrPair<This> fwd_getOffset(DifferentialPtrPair<This> self, int elements)
{
return DifferentialPtrPair<This>(
self.p.getOffset(elements),
self.d.getOffset(elements));
}

[ForceInline]
Expand Down Expand Up @@ -229,13 +247,22 @@ public struct TorchTensorViewAddress<T> : IPointerLikeAddress<T>
}

[require(cuda)]
[Differentiable]
[ForwardDerivative(fwd_getOffset)]
public This getOffset(int elements)
{
This result;
result.inner = inner.getOffset(elements);
return result;
}

static DifferentialPtrPair<This> fwd_getOffset(DifferentialPtrPair<This> self, int elements)
{
return DifferentialPtrPair<This>(
self.p.getOffset(elements),
self.d.getOffset(elements));
}
Comment on lines +259 to +264
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TorchTensorViewAddress.getOffset is marked [require(cuda)], but its forward derivative helper fwd_getOffset is not. On non-CUDA targets this helper may still be type-checked/compiled and it calls getOffset, which is CUDA-only, potentially causing compilation errors. Mark fwd_getOffset with the same [require(cuda)] (or otherwise ensure it is excluded on non-CUDA targets).

Copilot uses AI. Check for mistakes.

[ForceInline]
[require(cuda_glsl_hlsl_metal_spirv, sm_6_6)]
public void atomicAdd(uint index, T value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Missing [require(cuda)] on fwd_getOffset.

The primal getOffset on TorchTensorViewAddress (and the __init, __subscript) are all marked [require(cuda)]. The new fwd_getOffset should be consistent:

Suggested change
public void atomicAdd(uint index, T value)
[require(cuda)]
static DifferentialPtrPair<This> fwd_getOffset(DifferentialPtrPair<This> self, int elements)

Without it, a non-CUDA target could theoretically resolve this derivative function even though the primal is CUDA-only, producing a confusing error instead of a clean capability mismatch.

Expand Down Expand Up @@ -300,9 +327,18 @@ internal extension<T> Ptr<T, Access.ReadWrite, AddressSpace.Device> : IPointerLi
set { this[index] = newValue; }
}

[Differentiable]
[ForwardDerivative(fwd_getOffset)]
internal This getOffset(int elements)
{
return This(this + elements);
return no_diff(This(this + elements));
}

static DifferentialPtrPair<This> fwd_getOffset(DifferentialPtrPair<This> self, int elements)
{
return DifferentialPtrPair<This>(
self.p + elements,
self.d + elements);
}

[require(hlsl, sm_6_6)]
Expand Down
7 changes: 3 additions & 4 deletions source/standard-modules/neural/ilayer.slang
Original file line number Diff line number Diff line change
Expand Up @@ -16,13 +16,12 @@ public interface ILayer<T, InputVector, OutputVector, Layout, Activation>
where Activation : IActivation<T>
{
/// Forward evaluation: y = f(x).
/// Address is passed as parameter to enable autodiff gradient routing.
/// @param input Input vector.
/// @param weightAddress Weight address (pointer-like).
/// @param biasAddress Bias address (pointer-like). Pass `none` if no bias.
/// @param parameterAddr Base address of this layer's contiguous parameter block.
/// Weights start at offset 0; bias (if any) follows immediately.
/// @return Output vector.
[Differentiable]
public OutputVector eval<A>(InputVector input, A weightAddress, Optional<A> biasAddress = none)
public OutputVector eval<A>(InputVector input, A parameterAddr)
where A : IPointerLikeAddress<T>
where A.Differential : IPointerLikeAddress<T.Differential>;
}
16 changes: 8 additions & 8 deletions source/standard-modules/neural/layers.slang
Original file line number Diff line number Diff line change
Expand Up @@ -13,14 +13,14 @@ suitable for building multi-layer perceptrons (MLPs) and similar architectures.
let layer = FFLayer<float, Vec4, Vec2, LinearLayout, ReLU<float>>();
```

2. **Forward pass:** Call `eval()` with address and input:
2. **Forward pass:** Call `eval()` with base parameter address and input:
```
let output = layer.eval<Address>(input, weightAddr, biasAddr);
let output = layer.eval<Address>(input, paramAddr);
```

3. **Training (backward pass):** Use autodiff with `DifferentialPtrPair`:
```
var addrPair = DifferentialPtrPair<Address>(addr, gradAddr);
var addrPair = DifferentialPtrPair<Address>(paramAddr, gradAddr);
bwd_diff(computeOutput)(addrPair, inputPair, layer, dOutput);
```

Expand Down Expand Up @@ -67,23 +67,23 @@ public struct FFLayer<

/// Forward evaluation: y = Activation(W*x + b).
/// @param input Input vector.
/// @param weightAddr Weight address (pointer-like).
/// @param biasAddr Bias address (pointer-like). Pass `none` if no bias.
/// @param parameterAddr Base address of contiguous parameter block (weights then bias).
/// @return Output vector after linear transform and activation.
[Differentiable]
[ForceInline]
public OutputVector eval<A>(InputVector input, A weightAddr, Optional<A> biasAddr = none)
public OutputVector eval<A>(InputVector input, A parameterAddr)
where A : IPointerLikeAddress<T>
where A.Differential : IPointerLikeAddress<T.Differential>
{
OutputVector y;
if(HasBias)
{
y = input.linearTransform<A, Layout, OutputVector>(weightAddr, biasAddr.value);
let biasAddr = parameterAddr.getOffset(OutputVector.Size * InputVector.Size);
y = input.linearTransform<A, Layout, OutputVector>(parameterAddr, biasAddr);
}
else
{
y = input.linearTransform<A, Layout, OutputVector>(weightAddr);
y = input.linearTransform<A, Layout, OutputVector>(parameterAddr);
}
return activation.eval<OutputVector>(y);
}
Expand Down
35 changes: 7 additions & 28 deletions tests/neural/activation-with-fflayer-test.slang
Original file line number Diff line number Diff line change
Expand Up @@ -45,13 +45,10 @@ bool testFFLayerReLU()
typealias Layer = FFLayer<float, Vec2, Vec2, LinearLayout, ReLU<float>, true>;
let layer = Layer();

let weightAddr = baseAddr.getOffset(0);
let biasAddr = baseAddr.getOffset(4);

float[2] arr = {-1.0, 2.0};
let x = Vec2(arr);

let y = layer.eval<Address>(x, weightAddr, biasAddr);
let y = layer.eval<Address>(x, baseAddr);

// ReLU([-1, 2]) = [0, 2]
return approxEqual(y[0], 0.0) && approxEqual(y[1], 2.0);
Expand All @@ -68,13 +65,10 @@ bool testFFLayerLeakyReLU()
let leakyRelu = LeakyReLU<float>(0.1); // alpha = 0.1
let layer = Layer(leakyRelu);

let weightAddr = baseAddr.getOffset(0);
let biasAddr = baseAddr.getOffset(4);

float[2] arr = {-1.0, 2.0};
let x = Vec2(arr);

let y = layer.eval<Address>(x, weightAddr, biasAddr);
let y = layer.eval<Address>(x, baseAddr);

// LeakyReLU([-1, 2], 0.1) = [-0.1, 2]
return approxEqual(y[0], -0.1) && approxEqual(y[1], 2.0);
Expand All @@ -90,13 +84,10 @@ bool testFFLayerSigmoid()
typealias Layer = FFLayer<float, Vec2, Vec2, LinearLayout, Sigmoid<float>, true>;
let layer = Layer();

let weightAddr = baseAddr.getOffset(0);
let biasAddr = baseAddr.getOffset(4);

float[2] arr = {0.0, 0.0};
let x = Vec2(arr);

let y = layer.eval<Address>(x, weightAddr, biasAddr);
let y = layer.eval<Address>(x, baseAddr);

// Sigmoid([0, 0]) = [0.5, 0.5]
return approxEqual(y[0], 0.5) && approxEqual(y[1], 0.5);
Expand All @@ -112,13 +103,10 @@ bool testFFLayerTanh()
typealias Layer = FFLayer<float, Vec2, Vec2, LinearLayout, TanhActivation<float>, true>;
let layer = Layer();

let weightAddr = baseAddr.getOffset(0);
let biasAddr = baseAddr.getOffset(4);

float[2] arr = {0.0, 0.0};
let x = Vec2(arr);

let y = layer.eval<Address>(x, weightAddr, biasAddr);
let y = layer.eval<Address>(x, baseAddr);

// Tanh([0, 0]) = [0, 0]
return approxEqual(y[0], 0.0) && approxEqual(y[1], 0.0);
Expand All @@ -134,13 +122,10 @@ bool testFFLayerExp()
typealias Layer = FFLayer<float, Vec2, Vec2, LinearLayout, ExpActivation<float>, true>;
let layer = Layer();

let weightAddr = baseAddr.getOffset(0);
let biasAddr = baseAddr.getOffset(4);

float[2] arr = {0.0, 0.0};
let x = Vec2(arr);

let y = layer.eval<Address>(x, weightAddr, biasAddr);
let y = layer.eval<Address>(x, baseAddr);

// Exp([0, 0]) = [1, 1]
return approxEqual(y[0], 1.0) && approxEqual(y[1], 1.0);
Expand All @@ -156,13 +141,10 @@ bool testFFLayerIdentity()
typealias Layer = FFLayer<float, Vec2, Vec2, LinearLayout, IdentityActivation<float>, true>;
let layer = Layer();

let weightAddr = baseAddr.getOffset(0);
let biasAddr = baseAddr.getOffset(4);

float[2] arr = {-1.0, 2.0};
let x = Vec2(arr);

let y = layer.eval<Address>(x, weightAddr, biasAddr);
let y = layer.eval<Address>(x, baseAddr);

// Identity([-1, 2]) = [-1, 2]
return approxEqual(y[0], -1.0) && approxEqual(y[1], 2.0);
Expand All @@ -178,14 +160,11 @@ bool testFFLayerSine()
typealias Layer = FFLayer<float, Vec2, Vec2, LinearLayout, SineActivation<float>, true>;
let layer = Layer();

let weightAddr = baseAddr.getOffset(0);
let biasAddr = baseAddr.getOffset(4);

float pi = 3.14159265;
float[2] arr = {0.0, pi / 2.0};
let x = Vec2(arr);

let y = layer.eval<Address>(x, weightAddr, biasAddr);
let y = layer.eval<Address>(x, baseAddr);

// Sin([0, pi/2]) = [0, 1]
return approxEqual(y[0], 0.0, 0.001) && approxEqual(y[1], 1.0, 0.001);
Expand Down
104 changes: 104 additions & 0 deletions tests/neural/basic-ilayer-ffn-backward-test.slang
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
// Unit test for ILayer with multi-layer FFN (backward pass via autodiff).
// Exercises getOffset inside the differentiable function so the custom
// derivative of getOffset is used to propagate DifferentialPtrPair.
//
//TEST(compute, vulkan):COMPARE_COMPUTE_EX(filecheck-buffer=BUFFER):-vk -compute -shaderobj -xslang -experimental-feature -output-using-type -emit-spirv-directly
//TEST(compute, vulkan):COMPARE_COMPUTE_EX(filecheck-buffer=BUFFER):-mtl -compute -shaderobj -output-using-type -xslang -experimental-feature
//TEST(compute, vulkan):COMPARE_COMPUTE_EX(filecheck-buffer=BUFFER):-cuda -compute -shaderobj -output-using-type -capability cuda_sm_7_0 -xslang -experimental-feature
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good test — the math checks out and it exercises the key scenario (getOffset inside a differentiable function with a multi-layer FFN).

One observation: this test only exercises BindlessAddress. The existing fflayer-autodiff-backward-test covers BindlessAddress and PointerAddress. It would be good to have coverage of the custom fwd_getOffset on at least PointerAddress too, since each implementation has its own fwd_getOffset. The Ptr extension's version in particular is implemented differently (uses raw + instead of getOffset).

Also, this test is missing the DX12 backend line that the corresponding forward test (basic-ilayer-ffn-forward-test.slang) has. Is that intentional?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Consider adding a PointerAddress test variant as well.

This backward test only exercises BindlessAddress. The existing fflayer-autodiff-backward-test.slang does cover PointerAddress (via TEST_POINTER=1), and that test is updated in this PR to call getOffset inside the differentiable function. However, since this new test is the primary one that exercises multi-layer getOffset inside a differentiable context (with non-trivial offset=6), a TEST_POINTER=1 CUDA variant here would strengthen coverage of PointerAddress.fwd_getOffset in the multi-layer scenario.

Not blocking — the existing backward test suite provides reasonable coverage.


import slang.neural;

// Same 2-layer FFN as the forward test:
// Layer1 (2->2): W1 = [[2,-1],[0.5,3]], b1 = [1,-2] (6 params at offset 0)
// Layer2 (2->1): W2 = [[-2,4]], b2 = [0.5] (3 params at offset 6)
// Total: 9 parameters

//TEST_INPUT: set parametersFloat = ubuffer(data=[2.0 -1.0 0.5 3.0 1.0 -2.0 -2.0 4.0 0.5], stride=4)
RWStructuredBuffer<float> parametersFloat;

//TEST_INPUT: set params = ubuffer(data=[0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0], stride=4)
uniform RWStructuredBuffer<float>.Handle params;

//TEST_INPUT: set gradParams = ubuffer(data=[0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0], stride=4)
uniform RWStructuredBuffer<float>.Handle gradParams;

//TEST_INPUT: ubuffer(data=[0 0 0 0 0 0 0 0 0], stride=4):out,name=resultBuffer
RWStructuredBuffer<uint> resultBuffer;

typealias Address = BindlessAddress<float>;
typealias V2 = InlineVector<float, 2>;
typealias V1 = InlineVector<float, 1>;
typealias Act = IdentityActivation<float>;
typealias Layer1 = FFLayer<float, V2, V2, LinearLayout, Act, true>;
typealias Layer2 = FFLayer<float, V2, V1, LinearLayout, Act, true>;

bool approxEqual(float a, float b, float eps = 0.001)
{
return abs(a - b) < eps;
}

// All address arithmetic happens inside the differentiable function,
// so getOffset's custom derivative propagates the DifferentialPtrPair.
[Differentiable]
V1 computeFFN(Address baseAddr, V2 input, Layer1 layer1, Layer2 layer2)
{
let layer1Addr = baseAddr.getOffset(0);
let layer2Addr = baseAddr.getOffset(6);

let h = layer1.eval<Address>(input, layer1Addr);
return layer2.eval<Address>(h, layer2Addr);
}

[shader("compute")]
[numthreads(1, 1, 1)]
void computeMain()
{
for (int i = 0; i < 9; i++)
{
params[i] = parametersFloat[i];
gradParams[i] = 0.0;
}

let baseAddr = Address(params);
let gradBaseAddr = Address(gradParams);

float[2] xArr = { 1.5, -2.0 };
let x = V2(xArr);
let layer1 = Layer1();
let layer2 = Layer2();

var baseAddrPair = DifferentialPtrPair<Address>(baseAddr, gradBaseAddr);
var inputPair = diffPair(x);
let dOutput = V1(1.0);

bwd_diff(computeFFN)(baseAddrPair, inputPair, layer1, layer2, dOutput);

// Expected gradients (dL/dy = [1]):
// Forward: h = W1*x + b1 = [6, -7.25], y = W2*h + b2 = -40.5
//
// dL/dW1 = outer(W2^T * dL/dy, x) = outer([-2, 4], [1.5, -2])
// = [-3, 4, 6, -8]
// dL/db1 = W2^T * dL/dy = [-2, 4]
// dL/dW2 = outer(dL/dy, h) = [6, -7.25]
// dL/db2 = dL/dy = [1]
uint idx = 0;
resultBuffer[idx++] = approxEqual(gradParams[0], -3.0); // dW1[0,0]
resultBuffer[idx++] = approxEqual(gradParams[1], 4.0); // dW1[0,1]
resultBuffer[idx++] = approxEqual(gradParams[2], 6.0); // dW1[1,0]
resultBuffer[idx++] = approxEqual(gradParams[3], -8.0); // dW1[1,1]
resultBuffer[idx++] = approxEqual(gradParams[4], -2.0); // db1[0]
resultBuffer[idx++] = approxEqual(gradParams[5], 4.0); // db1[1]
resultBuffer[idx++] = approxEqual(gradParams[6], 6.0); // dW2[0,0]
resultBuffer[idx++] = approxEqual(gradParams[7], -7.25); // dW2[0,1]
resultBuffer[idx++] = approxEqual(gradParams[8], 1.0); // db2

// BUFFER: 1
// BUFFER-NEXT: 1
// BUFFER-NEXT: 1
// BUFFER-NEXT: 1
// BUFFER-NEXT: 1
// BUFFER-NEXT: 1
// BUFFER-NEXT: 1
// BUFFER-NEXT: 1
// BUFFER-NEXT: 1
}
Original file line number Diff line number Diff line change
Expand Up @@ -46,14 +46,12 @@ bool forwardCheck()
let layer1 = Layer1();
let layer2 = Layer2();

// Compute weight/bias addresses for each layer
let w1Addr = baseAddr.getOffset(0);
let b1Addr = baseAddr.getOffset(4);
let w2Addr = baseAddr.getOffset(6);
let b2Addr = baseAddr.getOffset(8);

let h = layer1.eval<Address>(x, w1Addr, b1Addr);
let y = layer2.eval<Address>(h, w2Addr, b2Addr);
// Compute base addresses for each layer's parameter block
let layer1Addr = baseAddr.getOffset(0);
let layer2Addr = baseAddr.getOffset(6);

let h = layer1.eval<Address>(x, layer1Addr);
let y = layer2.eval<Address>(h, layer2Addr);

// Expected:
// h = W1*x + b1
Expand Down
Loading
Loading