AI-primer

This repository records my own journey to learn AI. It includdes the following catergories.

Transformer

Self-Attention

transformer/self_attention_example.py: Basic self-attention implementation with fixed 64-dimension key/value
transformer/self_attn.py: BERT attention visualization
transformer/self_attn.ipynb: Jupyter notebook for attention analysis

Attention Implementations

transformer/flex_attention.py: Custom multihead attention with performance benchmarking
transformer/flash_attention_comparison.py: FlashAttention vs FlexAttention comparison
transformer/attention_benchmark_comparison.py: NEW - Comprehensive comparison of Simple vs Flash vs Flex Attention (Inference Focus)
transformer/test_flash_vs_flex.py: Test script for attention comparison
pytorch/flex_attn_test.py: PyTorch flex attention test with 64-dimension key/value
pytorch/simple_flex_test.py: NEW - Simple FlexAttention test for inference (< 50 lines)

Encoder

Decoder

KV Cache

Usage Examples

Quick FlexAttention Test (Simple)

cd pytorch
python simple_flex_test.py

Comprehensive Attention Comparison

cd transformer
python attention_benchmark_comparison.py

Compare FlashAttention vs FlexAttention

cd transformer
python test_flash_vs_flex.py

Run Flex Attention Test

cd pytorch
python flex_attn_test.py

Run Attention Benchmarking

cd transformer
python flex_attention.py

Key Features

Fixed 64-Dimension Key/Value

All attention implementations use a fixed 64-dimension for key and value tensors, providing:

Consistent performance characteristics
Easier comparison between implementations
Optimized memory usage patterns

Three Attention Implementations

Simple Attention: Baseline implementation for comparison
FlashAttention: Optimized attention using PyTorch's SDPA with FlashAttention backend
FlexAttention: PyTorch's flexible attention implementation with score modification

Comprehensive Benchmarking

Inference Performance: Optimized for inference workloads (no gradient computation)
Performance comparison across different sequence lengths
Memory usage analysis
Output correctness validation
Speedup calculations relative to fastest implementation
Tokens per second metrics for throughput analysis

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
.vscode		.vscode
bertviz		bertviz
cuda		cuda
hardware		hardware
inference-performance-test		inference-performance-test
pytorch		pytorch
qwen-infer		qwen-infer
transformer		transformer
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AI-primer

Transformer

Self-Attention

Attention Implementations

Encoder

Decoder

KV Cache

Usage Examples

Quick FlexAttention Test (Simple)

Comprehensive Attention Comparison

Compare FlashAttention vs FlexAttention

Run Flex Attention Test

Run Attention Benchmarking

Key Features

Fixed 64-Dimension Key/Value

Three Attention Implementations

Comprehensive Benchmarking

About

Uh oh!

Releases

Packages

Languages

License

leideng/AI-primer

Folders and files

Latest commit

History

Repository files navigation

AI-primer

Transformer

Self-Attention

Attention Implementations

Encoder

Decoder

KV Cache

Usage Examples

Quick FlexAttention Test (Simple)

Comprehensive Attention Comparison

Compare FlashAttention vs FlexAttention

Run Flex Attention Test

Run Attention Benchmarking

Key Features

Fixed 64-Dimension Key/Value

Three Attention Implementations

Comprehensive Benchmarking

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages