Research repository for the Hazan Lab @ Princeton for experimenting with tensorized filtering.
The tensorized filtering algorithm was introduced in Marsden et al. (2024): Provable Length Generalization in Sequence Prediction via Spectral Filtering. arXiv:2411.01035.
Some briefing remarks about the repository.
We used the o200k_base
tokenizer from tiktoken to pre-tokenize roughly 10 billion tokens from FineWeb-Edu dataset for experimentation. Download it using this link.
This is a research repository, not a polished library. Expect to see magic numbers. Expect to see hard-coded paths.
In the current implementation, we only generate the tensorized filters by taking the Kronecker product of the original filters proposed in Agarwal et al. (2024): Spectral State Space Models. https://arxiv.org/abs/2312.06837.
Notably, we omit the autoregressive components and projections from the original Tensorized Spectral Filtering algorithm (Algorithm 3) from Marsden et al. (2024).
Note: We recommend using the uv package manager, made by Charlie Marsh '15.
Create a virtual environment with one of the following options:
uv:
uv venv --prompt tensorized-filters .venv
Python/pip:
python3 -m venv --prompt tensorized-filters .venv
Conda:
conda create -n tensorized-filters pytorch pytorch-cuda=12.4 -c pytorch -c nvidia
Note: If you want to use Flash FFT and/or Flash Attention, you will need to have a CUDA-enabled device. Please see their repositories for further instructions on installation.
Install the required packages with:
uv:
uv sync
Python/pip:
pip install -e .
We included SLURM scripts for distributed training environments. For local testing, you can directly run the train.py scripts using torchrun
.
We welcome contributors to:
- Submit pull requests
- Report issues
- Help improve the project overall
Apache 2.0 License
You can freely use, modify, and distribute the software, even in proprietary products, as long as you:
- Include proper attribution
- Include a copy of the license
- Mention any changes made
It also provides an express grant of patent rights from contributors.
See the LICENSE file for more details.
If you use this repository or find our work valuable, please consider citing it:
@article{tensorizedfiltering,
title={Provable Length Generalization in Sequence Prediction via Spectral Filtering},
author={Annie Marsden and Evan Dogariu and Naman Agarwal and Xinyi Chen and Daniel Suo and Elad Hazan},
journal={arXiv preprint arXiv:2411.01035},
year={2024},
url={https://arxiv.org/abs/2411.01035}
}
@article{flashstu,
title={Flash STU: Fast Spectral Transform Units},
author={Y. Isabel Liu, Windsor Nguyen, Yagiz Devre, Evan Dogariu, Anirudha Majumdar, Elad Hazan},
journal={arXiv preprint arXiv:2409.10489},
year={2024},
url={https://arxiv.org/abs/2409.10489}
}