Skip to content
Closed
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
95 changes: 95 additions & 0 deletions neural_network/sliding_window_attention.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
"""
- - - - - -- - - - - - - - - - - - - - - - - - - - - - -
Name - - sliding_window_attention.py
Goal - - Implement a neural network architecture using sliding window attention for sequence modeling tasks.

Check failure on line 4 in neural_network/sliding_window_attention.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (E501)

neural_network/sliding_window_attention.py:4:89: E501 Line too long (108 > 88)
Detail: Total 5 layers neural network
* Input layer
* Sliding Window Attention Layer
* Feedforward Layer
* Output Layer
Author: Stephen Lee
Github: [email protected]
Date: 2024.10.20
References:
1. Choromanska, A., et al. (2020). "On the Importance of Initialization and Momentum in Deep Learning." *Proceedings of the 37th International Conference on Machine Learning*.

Check failure on line 14 in neural_network/sliding_window_attention.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (E501)

neural_network/sliding_window_attention.py:14:89: E501 Line too long (179 > 88)
2. Dai, Z., et al. (2020). "Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention." *arXiv preprint arXiv:2006.16236*.

Check failure on line 15 in neural_network/sliding_window_attention.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (E501)

neural_network/sliding_window_attention.py:15:89: E501 Line too long (146 > 88)
3. [Attention Mechanisms in Neural Networks](https://en.wikipedia.org/wiki/Attention_(machine_learning))
- - - - - -- - - - - - - - - - - - - - - - - - - - - - -
"""

import numpy as np


class SlidingWindowAttention:
"""Sliding Window Attention Module.

This class implements a sliding window attention mechanism where the model
attends to a fixed-size window of context around each token.

Attributes:
window_size (int): The size of the attention window.
embed_dim (int): The dimensionality of the input embeddings.
"""

def __init__(self, embed_dim: int, window_size: int):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please provide return type hint for the function: __init__. If the function does not return a value, please provide the type hint as: def function() -> None:

"""
Initialize the SlidingWindowAttention module.

Args:
embed_dim (int): The dimensionality of the input embeddings.
window_size (int): The size of the attention window.
"""
self.window_size = window_size
self.embed_dim = embed_dim
self.attention_weights = np.random.randn(embed_dim, embed_dim)

Check failure on line 44 in neural_network/sliding_window_attention.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (NPY002)

neural_network/sliding_window_attention.py:44:34: NPY002 Replace legacy `np.random.randn` call with `np.random.Generator`

def forward(self, x: np.ndarray) -> np.ndarray:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please provide descriptive name for the parameter: x

"""
Forward pass for the sliding window attention.

Args:
x (np.ndarray): Input tensor of shape (batch_size, seq_length, embed_dim).

Returns:
np.ndarray: Output tensor of shape (batch_size, seq_length, embed_dim).

>>> x = np.random.randn(2, 10, 4) # Batch size 2, sequence length 10, embedding dimension 4

Check failure on line 56 in neural_network/sliding_window_attention.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (E501)

neural_network/sliding_window_attention.py:56:89: E501 Line too long (100 > 88)
>>> attention = SlidingWindowAttention(embed_dim=4, window_size=3)
>>> output = attention.forward(x)
>>> output.shape
(2, 10, 4)
>>> (output.sum() != 0).item() # Check if output is non-zero
True
"""
batch_size, seq_length, _ = x.shape
output = np.zeros_like(x)

for i in range(seq_length):
# Define the window range
start = max(0, i - self.window_size // 2)
end = min(seq_length, i + self.window_size // 2 + 1)

# Extract the local window
local_window = x[:, start:end, :]

# Compute attention scores
attention_scores = np.matmul(local_window, self.attention_weights)

# Average the attention scores
output[:, i, :] = np.mean(attention_scores, axis=1)

return output


if __name__ == "__main__":
import doctest

doctest.testmod()

# Example usage
x = np.random.randn(

Check failure on line 90 in neural_network/sliding_window_attention.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (NPY002)

neural_network/sliding_window_attention.py:90:9: NPY002 Replace legacy `np.random.randn` call with `np.random.Generator`
2, 10, 4
) # Batch size 2, sequence length 10, embedding dimension 4
attention = SlidingWindowAttention(embed_dim=4, window_size=3)
output = attention.forward(x)
print(output)
Loading