This repository records my own journey to learn AI. It includdes the following catergories.
transformer/self_attention_example.py: Basic self-attention implementation with fixed 64-dimension key/valuetransformer/self_attn.py: BERT attention visualizationtransformer/self_attn.ipynb: Jupyter notebook for attention analysis
transformer/flex_attention.py: Custom multihead attention with performance benchmarkingtransformer/flash_attention_comparison.py: FlashAttention vs FlexAttention comparisontransformer/attention_benchmark_comparison.py: NEW - Comprehensive comparison of Simple vs Flash vs Flex Attention (Inference Focus)transformer/test_flash_vs_flex.py: Test script for attention comparisonpytorch/flex_attn_test.py: PyTorch flex attention test with 64-dimension key/valuepytorch/simple_flex_test.py: NEW - Simple FlexAttention test for inference (< 50 lines)
cd pytorch
python simple_flex_test.pycd transformer
python attention_benchmark_comparison.pycd transformer
python test_flash_vs_flex.pycd pytorch
python flex_attn_test.pycd transformer
python flex_attention.pyAll attention implementations use a fixed 64-dimension for key and value tensors, providing:
- Consistent performance characteristics
- Easier comparison between implementations
- Optimized memory usage patterns
- Simple Attention: Baseline implementation for comparison
- FlashAttention: Optimized attention using PyTorch's SDPA with FlashAttention backend
- FlexAttention: PyTorch's flexible attention implementation with score modification
- Inference Performance: Optimized for inference workloads (no gradient computation)
- Performance comparison across different sequence lengths
- Memory usage analysis
- Output correctness validation
- Speedup calculations relative to fastest implementation
- Tokens per second metrics for throughput analysis