[Ecosystem] LMCache

### Contact emails

@apostacheng@gmail.com, Yihua Cheng
@jiayi3@uchicago.edu, Jiayi Yao
@kuntai@uchicago.edu, Kuntai Du
@nijaba@tensormesh.ai, Nick Barcet


### Project summary

LLM serving engine extension that reduce TTFT and increase throughput

### Project description

LMCache is an LLM serving engine extension that combines with vLLM or SGLang to reduce TTFT and increase throughput, especially under long-context scenarios. By storing the KV caches of reusable texts across various locations, including (GPU, CPU DRAM, Local Disk, Databases), LMCache reuses the KV caches of any reused text (not necessarily prefix) in any serving engine instance. Thus, LMCache saves precious GPU cycles and reduces user response delay.

By combining LMCache with vLLM, developers achieve 3-10x delay savings and GPU cycle reduction in many LLM use cases, including multi-round QA and RAG.


### Are there any other projects in the PyTorch Ecosystem similar to yours?  If, yes, what are they?

No

### Project repo URL

https://github.com/LMCache

### Additional repos in scope of the application

None

### Project license

Apache 2.0

### GitHub handles of the project maintainer(s)

YaoJiayi, ApostaC, KuntaiDu, Shaoting-Feng, maobaolong, sammshen, hickeyma, HuaizhengZhang

### Is there a corporate or academic entity backing this project?  If so, please provide the name and URL of the entity.

TensorMesh.ai https://tensormesh.ai/ and University of Chicago https://cs.uchicago.edu

### Website URL

https://lmcache.ai/

### Documentation

https://docs.lmcache.ai/

### How do you build and test the project today (continuous integration)?  Please describe.

We are using the following to run our CI:

- 2 L4 GPU servers on GCP
- Default github runner

Our CI pipeline ensures:

- Code quality check,  triggered by each commit to the PR, hosted on github runners
- Unit test uses buildkite, triggered by each commit to the PR, running on the L4 GPU servers
- End-to-end correctness and performance, triggered once for each PR, running on the L4 GPU servers
- Docker images and pip packages are built after merging each PR using github actions on default github runners

We have a 2 week release cadence 


### Version of PyTorch

- We maintain compatibility with upstream vLLM through a connector API we contribute.  Current version: 0.10.0
- We support pytorch 2.2.0~2.8.0 (latest). Right now the release version is built with pytorch 2.7.1 (which is also required by vllm)


### Components of PyTorch

- Pytorch CUDA/CPP/Rocm extensions
- Using standard pytorch python APIs (tensor-related APIs)


### How long do you expect to maintain the project?

We are fully committed to maintaining this project for as long as we can see.


### Additional information

LMCache is the result of research conducted at the University of Chicago by a team of researchers under the supervision of assistant professor Junchen Jiang.  

Reference papers:

- [CacheGen: KV Cache Compression and Streaming for Fast Large Language Model Serving](https://dl.acm.org/doi/10.1145/3651890.3672274)
- [CacheBlend: Fast Large Language Model Serving with Cached Knowledge Fusion](https://arxiv.org/abs/2405.16444)
- [Do Large Language Models Need a Content Delivery Network?](https://arxiv.org/abs/2409.13761)

The project currently has 99 contributors and 8 maintainers which are from diverse organizations including IBM, Tencent, Bytedance, etc...

If the project is accepted as an ecosystem project, we will then propose it as a hosted project.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Ecosystem] LMCache #44

Contact emails

Project summary

Project description

Are there any other projects in the PyTorch Ecosystem similar to yours? If, yes, what are they?

Project repo URL

Additional repos in scope of the application

Project license

GitHub handles of the project maintainer(s)

Is there a corporate or academic entity backing this project? If so, please provide the name and URL of the entity.

Website URL

Documentation

How do you build and test the project today (continuous integration)? Please describe.

Version of PyTorch

Components of PyTorch

How long do you expect to maintain the project?

Additional information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Ecosystem] LMCache #44

Description

Contact emails

Project summary

Project description

Are there any other projects in the PyTorch Ecosystem similar to yours? If, yes, what are they?

Project repo URL

Additional repos in scope of the application

Project license

GitHub handles of the project maintainer(s)

Is there a corporate or academic entity backing this project? If so, please provide the name and URL of the entity.

Website URL

Documentation

How do you build and test the project today (continuous integration)? Please describe.

Version of PyTorch

Components of PyTorch

How long do you expect to maintain the project?

Additional information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions