Efficient 2-Simplicial Transformer with Low-Rank KV Cache Compression – A research implementation for memory-efficient autoregressive generation.
This repository implements a 2-Simplicial Transformer with optimized KV cache management, following the architecture from the Fast & Simplex paper. The project focuses on memory-efficient autoregressive generation through innovative cache compression techniques.
- 2-Simplicial Attention: Implements the novel attention mechanism with
(K₁, K₂, V₁, V₂)cache structure - Low-Rank Compression: SVD-based compression of KV cache matrices for significant memory reduction
- Hybrid Selection: Combines L2-norm selection with low-rank compression for optimal quality-memory tradeoff
- Incremental Optimization: PyTorch vanilla → Triton kernels → compression techniques
# 2-Simplicial Attention with KV Cache
Attention(K₁, K₂, V₁, V₂) = σ(Q·K₁) ⊙ σ(Q·K₂) · V₁ · V₂simplicial-transformer/
├── simplicial/
│ ├── attention/ # 2-simplicial attention mechanisms
│ ├── cache/ # KV cache with compression (K₁, K₂, V₁, V₂)
│ ├── layers/ # Feedforward and simplicial blocks
│ ├── models/ # Transformer implementations
│ ├── utils/ # Utility functions (RoPE, sliding window)
│ └── validation/ # Correctness validation tools
├── training/ # Training scripts and configs
├── scripts/ # Inference and data preparation scripts
├── tests/ # Comprehensive test suite
└── debug_tools/ # Debug and validation scripts
# Clone the repository
git clone https://github.com/and-per-i/too-simplex.git
cd too-simplex
# Install dependencies
pip install -r requirements.txt
# Install in development mode
pip install -e .# Train with default config
python training/train.py --config training/configs/logic_finetuning_4090.yaml
# Or use the launcher script
./start.sh train
# With custom config and paths
./start.sh train --config training/configs/logic_finetuning_4090.yaml --log-dir logs --checkpoint-dir checkpoints# Generate text
python scripts/generate_text.py
# Or use the CLI entry point
simplicial-generateMIT License