Khushiyant
diff --git a/‎README.md‎
Lines changed: 46 additions & 17 deletions b/‎README.md‎
Lines changed: 46 additions & 17 deletions
diff --git a/‎benchmarks/benchmark_lif.py‎
Lines changed: 2 additions & 2 deletions b/‎benchmarks/benchmark_lif.py‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎pyproject.toml‎
Lines changed: 1 addition & 1 deletion b/‎pyproject.toml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎src/tether/data/encoding.py‎
Lines changed: 86 additions & 0 deletions b/‎src/tether/data/encoding.py‎
Lines changed: 86 additions & 0 deletions
diff --git a/‎src/tether/functional/lif.py‎
Lines changed: 13 additions & 5 deletions b/‎src/tether/functional/lif.py‎
Lines changed: 13 additions & 5 deletions
@@ -2,16 +2,20 @@
 <img width="1000" height="600" alt="Gemini_Generated_Image_xofloxxofloxxofl" src="https://github.com/user-attachments/assets/420a486f-1a09-4d72-a98b-22678abd0e75" />
 
 
-**Tether** is a Triton-powered framework for training and deploying **Spiking Transformers**. 
+**Tether** is a Triton-powered framework for training and deploying **Spiking Transformers** and deep Spiking Neural Networks (SNNs).
 
-We’ve solved the non-differentiability of discrete spikes by implementing a custom **Arctan Surrogate Gradient** in the autograd backward pass.
+We’ve solved the non-differentiability of discrete spikes by implementing high-performance Triton kernels with modular **Surrogate Gradients**.
 
 ## Key Features
 
-- **Fused LIF Kernel**: Manages membrane potential statefulness across temporal windows without global memory stalls, utilizing Triton for high-performance GPU execution.
-- **Linear Spike-Driven Attention**: Eliminates the $O(N^2)$ Softmax bottleneck, allowing for massive context windows with significantly lower energy per inference (Joules/op).
-- **Bit-Packing** (In Progress): Optimization for memory-efficient spike storage.
-- **Triton-Powered**: Leverages OpenAI's Triton language for custom CUDA kernels.
+- **High-Performance Neurons**:
+  - **LIF (Leaky Integrate-and-Fire)**: Standard spiking neuron with fused Triton kernels.
+  - **ALIF (Adaptive LIF)**: Neurons with adaptive thresholds for better temporal dynamics.
+  - **PLIF (Parametric LIF)**: Neurons with learnable, per-channel decay and threshold parameters.
+- **Modular Surrogate Gradients**: Choose from `Arctan`, `Sigmoid`, or `FastSigmoid` to train your SNNs effectively.
+- **Linear Spike-Driven Attention**: Eliminates the $O(N^2)$ Softmax bottleneck, allowing for massive context windows with significantly lower energy per inference.
+- **Data Utilities**: `SpikingDatasetWrapper` and encoding functions (`rate_encoding`, `latency_encoding`) to convert static datasets to spike trains.
+- **Triton-Powered**: Leverages OpenAI's Triton language for custom CUDA kernels, enabling massive speedups (60x+) over vanilla PyTorch.
 
 ## Installation
 
@@ -29,6 +33,25 @@ pip install torch triton numpy
 
 ## Usage
 
+### Using PLIF with Sigmoid Surrogate
+
+```python
+import torch
+from tether import PLIF, Sigmoid
+
+# Create a Parametric LIF layer with Sigmoid surrogate
+# Decay and threshold are learnable vectors per neuron
+layer = PLIF(
+    n_neurons=128, 
+    init_decay=0.9, 
+    surrogate=Sigmoid(alpha=4.0)
+).cuda()
+
+# Input sequence: (Time, Batch, Neurons)
+x = torch.randn(32, 16, 128).cuda()
+spikes = layer(x)
+```
+
 ### Training a Spiking Language Model
 
 The `train_stories.py` script demonstrates training a **Spiking-LLM** on the TinyShakespeare dataset.
@@ -37,20 +60,26 @@ The `train_stories.py` script demonstrates training a **Spiking-LLM** on the Tin
 python train_stories.py
 ```
 
-This will:
-1. Download the `input.txt` dataset.
-2. Initialize a Tether Spiking Transformer (4 layers, 8 heads).
-3. Train using the custom Arctan Surrogate Gradient.
-4. Generate sample text from the Spiking SNN.
+### Data Encoding
+
+```python
+from tether.data import SpikingDatasetWrapper, rate_encoding
+from torchvision.datasets import MNIST
+
+# Wrap MNIST to output spike trains
+spiking_mnist = SpikingDatasetWrapper(
+    MNIST(root="./data", download=True, train=True),
+    encode_fn=lambda x: rate_encoding(x, n_steps=10)
+)
+```
 
 ## Architecture
 
-- **`tether.kernels.lif`**: Custom Triton kernels for Leaky Integrate-and-Fire (LIF) forward and backward passes.
-- **`tether.functional.lif`**: PyTorch autograd function wrapping the Triton kernels.
-- **`tether.nn.attention`**: Linear Spike-Driven Attention mechanism.
-- **`tether.nn.block`**: Spiking Transformer Block implementation.
+- **`tether.kernels`**: Custom Triton kernels for LIF, ALIF, and PLIF.
+- **`tether.functional`**: PyTorch autograd functions wrapping the Triton kernels.
+- **`tether.nn`**: Neural network modules including `LIF`, `ALIF`, `PLIF`, `SpikingSelfAttention`.
+- **`tether.data`**: Utilities for spike encoding and dataset wrapping.
 
 ## License
 
-[Apache-2.0](https://github.com/Khushiyant/tether/blob/main/LICENSE)
-
+[Apache-2.0](https://github.com/Khushiyant/tether/blob/main/LICENSE)
@@ -80,7 +80,7 @@ def benchmark():
     for _ in range(10):
         with torch.no_grad():
             _ = lif_pytorch(x_seq, v_init, decay, threshold)
-            LIFSubFunction.apply(x_seq, v_init, decay, threshold, alpha)
+            LIFSubFunction.apply(x_seq, v_init, decay, threshold, alpha, 0)
 
     # Benchmark PyTorch
     torch.cuda.synchronize()
@@ -99,7 +99,7 @@ def benchmark():
     with torch.no_grad():
         for _ in range(iterations):
             # Note: We use apply but inside no_grad, so it just runs forward
-            LIFSubFunction.apply(x_seq, v_init, decay, threshold, alpha)
+            LIFSubFunction.apply(x_seq, v_init, decay, threshold, alpha, 0)
     torch.cuda.synchronize()
     triton_time = (time.time() - start_time) / iterations
     print(f"Triton Time:  {triton_time * 1000:.3f} ms")
 
@@ -12,7 +12,7 @@ requires = ["hatchling"]
 build-backend = "hatchling.build"
 
 [project.optional-dependencies]
-dev = ["pytest>=9.0.2"]
+dev = ["pytest>=9.0.2", "pytest-cov>=7.0.0"]
 docs = [
     "sphinx>=8.1.3",
     "sphinx-autodoc-typehints>=3.0.1",
 
@@ -0,0 +1,86 @@
+import torch
+from torch.utils.data import Dataset
+
+class SpikingDatasetWrapper(Dataset):
+    """
+    Wraps a standard dataset and applies an encoding function to the input.
+    """
+    def __init__(self, dataset: Dataset, encode_fn):
+        self.dataset = dataset
+        self.encode_fn = encode_fn
+
+    def __len__(self):
+        return len(self.dataset)
+
+    def __getitem__(self, idx):
+        x, y = self.dataset[idx]
+        return self.encode_fn(x), y
+
+def rate_encoding(x: torch.Tensor, n_steps: int, gain: float = 1.0) -> torch.Tensor:
+    """
+    Convert continuous values to spike trains using rate encoding (Bernoulli).
+    
+    Parameters
+    ----------
+    x : torch.Tensor
+        Input tensor with continuous values (usually in [0, 1]).
+    n_steps : int
+        Number of time steps to simulate.
+    gain : float
+        Scaling factor for firing probability.
+
+    Returns
+    -------
+    torch.Tensor
+        Spike tensor with shape (n_steps, *x.shape).
+    """
+    shape = (n_steps,) + x.shape
+    prob = torch.clamp(x * gain, 0.0, 1.0)
+    # Expand prob to time dimension
+    prob = prob.unsqueeze(0).expand(shape)
+    
+    # Generate spikes
+    spikes = torch.rand(shape, device=x.device) < prob
+    return spikes.float()
+
+def latency_encoding(x: torch.Tensor, n_steps: int, tau: float = 1.0, threshold: float = 0.01) -> torch.Tensor:
+    """
+    Convert continuous values to spike trains using latency encoding.
+    Higher values fire earlier.
+    
+    Parameters
+    ----------
+    x : torch.Tensor
+        Input tensor.
+    n_steps : int
+        Number of time steps.
+    tau : float
+        Time constant.
+    threshold : float
+        Threshold below which no spike is generated.
+
+    Returns
+    -------
+    torch.Tensor
+        Spike tensor with shape (n_steps, *x.shape).
+    """
+    # Calculate fire time: t_f = tau * ln(x / (x - theta)) ? 
+    # Or simplified: t_f = (1 - x) * n_steps
+    
+    # Linear latency:
+    # 1.0 -> step 0
+    # 0.0 -> step n_steps-1
+    
+    x = torch.clamp(x, 0.0, 1.0)
+    fire_step = ((1.0 - x) * (n_steps - 1)).long()
+    
+    spikes = torch.zeros((n_steps,) + x.shape, device=x.device)
+    
+    # Create a grid of time steps
+    time_grid = torch.arange(n_steps, device=x.device).reshape((n_steps,) + (1,) * x.ndim)
+    
+    # Spike where time matches fire_step and x > threshold
+    active = x > threshold
+    spikes = (time_grid == fire_step) & active.unsqueeze(0)
+    
+    return spikes.float()
@@ -4,7 +4,7 @@
 
 class LIFSubFunction(torch.autograd.Function):
     @staticmethod
-    def forward(ctx, x_seq, v_init, decay, threshold, alpha):
+    def forward(ctx, x_seq, v_init, decay, threshold, alpha, surrogate_type):
         """
         Forward pass of the LIF function.
 
@@ -22,6 +22,8 @@ def forward(ctx, x_seq, v_init, decay, threshold, alpha):
             Spiking threshold.
         alpha : torch.Tensor
             Surrogate gradient parameter.
+        surrogate_type : int
+            Type of surrogate gradient.
 
         Returns
         -------
@@ -49,10 +51,12 @@ def forward(ctx, x_seq, v_init, decay, threshold, alpha):
 
         # Save packed spikes for backward to save memory
         ctx.save_for_backward(out_spikes_packed, v_seq, v_init, decay, threshold, alpha)
-        return out_spikes, v_final
+        ctx.surrogate_type = surrogate_type
+        ctx.mark_non_differentiable(v_seq)
+        return out_spikes, v_final, v_seq
 
     @staticmethod
-    def backward(ctx, grad_spikes, grad_v_final):
+    def backward(ctx, grad_spikes, grad_v_final, grad_v_seq):
         """
         Backward pass of the LIF function.
 
@@ -64,13 +68,16 @@ def backward(ctx, grad_spikes, grad_v_final):
             Gradients w.r.t. spikes.
         grad_v_final : torch.Tensor
             Gradients w.r.t. final membrane potentials.
+        grad_v_seq : torch.Tensor
+            Gradients w.r.t. voltage sequence.
 
         Returns
         -------
         tuple
             Gradients w.r.t. inputs and parameters.
         """
         out_spikes_packed, v_seq, v_init, decay, threshold, alpha = ctx.saved_tensors
+        surrogate_type = ctx.surrogate_type
         n_steps, n_neurons = v_seq.shape
 
         grad_x = torch.empty_like(v_seq)
@@ -91,8 +98,9 @@ def backward(ctx, grad_spikes, grad_v_final):
             grad_v_final.contiguous(), v_init.contiguous(),
             n_neurons, n_steps, decay, threshold, alpha,
             grad_decay, grad_threshold, grad_alpha,
+            surrogate_type,
             BLOCK_SIZE=1024
         )
 
-        # Returns grads for (x_seq, v_init, decay, threshold, alpha)
-        return grad_x, grad_v_final, grad_decay, grad_threshold, grad_alpha
+        # Returns grads for (x_seq, v_init, decay, threshold, alpha, surrogate_type)
+        return grad_x, grad_v_final, grad_decay, grad_threshold, grad_alpha, None