Skip to content

WenqiJiang/SC-ANN-FPGA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 

Repository files navigation

FPGA-accelerated Vector Search

This is the repository of our SC'23 paper titled Co-design Hardware and Algorithm for Vector Search. We built FPGA accelerators for product-quantization-based vector search.

Abstract

Vector search has emerged as the foundation for large-scale information retrieval and machine learning systems, with search engines like Google and Bing processing tens of thousands of queries per second on petabyte-scale document datasets by evaluating vector similarities between encoded query texts and web documents. As performance demands for vector search systems surge, accelerated hardware offers a promising solution in the post-Moore’s Law era. We introduce FANNS, an end-to-end and scalable vector search framework on FPGAs. Given a user-provided recall requirement on a dataset and a hardware resource budget, FANNS automatically co-designs hardware and algorithm, subsequently generating the corresponding accelerator. The framework also supports scale-out by incorporating a hardware TCP/IP stack in the accelerator. FANNS attains up to 23.0× and 37.2× speedup compared to FPGA and CPU baselines, respectively, and demonstrates superior scalability to GPUs, achieving 5.5× and 7.6× speedup in median and 95th percentile (P95) latency within an eight-accelerator configuration. The remarkable performance of FANNS lays a robust groundwork for future FPGA integration in data centers and AI supercomputers.

Citation

@inproceedings{jiang2023co,
  title={Co-design hardware and algorithm for vector search},
  author={Jiang, Wenqi and Li, Shigang and Zhu, Yu and de Fine Licht, Johannes and He, Zhenhao and Shi, Runbin and Renggli, Cedric and Zhang, Shuai and Rekatsinas, Theodoros and Hoefler, Torsten and others},
  booktitle={Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis},
  pages={1--15},
  year={2023}
}


@article{jiang2023chameleon,
  title={Chameleon: a heterogeneous and disaggregated accelerator system for retrieval-augmented language models},
  author={Jiang, Wenqi and Zeller, Marco and Waleffe, Roger and Hoefler, Torsten and Alonso, Gustavo},
  journal={Proceedings of the VLDB Endowment},
  year={2025}
}

Related Projects

Chameleon is a heterogeneous accelerator system for RAG serving. It prototypes FPGA-based accelerators for retrieval and GPU-based LLM inference.

@article{jiang2023chameleon,
  title={Chameleon: a heterogeneous and disaggregated accelerator system for retrieval-augmented language models},
  author={Jiang, Wenqi and Zeller, Marco and Waleffe, Roger and Hoefler, Torsten and Alonso, Gustavo},
  journal={Proceedings of the VLDB Endowment},
  year={2025}
}

System Performance Optimization for RAG.

Code: https://github.com/google/rago

@inproceedings{rago:isca:2025,
  title={RAGO: Systematic Performance Optimization for Retrieval-Augmented Generation Serving},
  author={Jiang, Wenqi and Subramanian, Suvinay and Graves, Cat and Alonso, Gustavo and Yazdanbakhsh, Amir and Dadu, Vidushi},
  booktitle = {Proceedings of the 52th Annual International Symposium on Computer Architecture}
  year={2025}
}

Efficient algorithm for iterative RAG.

Code: https://github.com/amazon-science/piperag

@article{jiang2025piperag,
  title={PipeRAG: Fast retrieval-augmented generation via adaptive pipeline parallelism},
  author={Jiang, Wenqi and Zhang, Shuai and Han, Boran and Wang, Jie and Wang, Yuyang Bernie and Kraska, Tim},
  journal={Proceedings of the 31th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining},
  year={2025}
}

Accelerating graph-based vector search.

Code: https://github.com/fpgasystems/Falcon-accelerate-graph-vector-search

@article{jiang2024accelerating,
  title={Accelerating Graph-based Vector Search via Delayed-Synchronization Traversal},
  author={Jiang, Wenqi and Hu, Hang and Hoefler, Torsten and Alonso, Gustavo},
  journal={arXiv preprint arXiv:2406.12385},
  year={2024}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published