Releases · ModelEngine-Group/unified-cache-management

30 Jan 08:47

flesher0813

v0.3.0

8dd98d1

v0.3.0 Latest

Latest

HighLights

Refinement of PipelineStore Architecture and Enhancement of Core Capabilities #653 #711
Now supports 3FS for scalable and efficient storage backends #622
Features the new GSAOnDevice sparse attention algorithm, enabling high-performance HBM utilization across both CUDA and Ascend platforms.#647 #638
Aligned CacheBlend with the new UCM storage and sparse engine updates to support vLLM 0.9.2. #664

Known Issues

Layerwise is not supported when using vllm 0.11.0
- Currently, installing with pip install uc-manager does not support using vllm 0.11.0.
- If you need to use vLLM 0.11.0+ with UCM layerwise, please refer to vllm-project/vllm#26675 for modifications.

What's Changed

[bugfix]cherry-pick from 0.2.0release Fix KeyError by @qyh111 in #573
[bugfix] cherry-pick from 0.2.0release patch update by @wangwenxin0312 in #574
[fix]cherry pick from 0.2.0-release fix monitor issue (#572) by @qyh111 in #575
[bugfix] build hamming dist by @wangwenxin0312 in #577
[feat]Update data file layout to adapt to garbage collection by @qyh111 in #579
[bugfix]cherry pick from 0.2.0-release sparse patch & cmake by @wangwenxin0312 in #581
[bugfix] kvcomp config by @wangwenxin0312 in #584
[feat] KvCompOnDevice: per-KV-head Top-K for Qwen by @wangwenxin0312 in #589
feature for triton rerope by @xinSky00 in #497
[bugfix] kvcomp for qwen by @wangwenxin0312 in #594
[bugfix] share buffer used out (cherry-picked from #592) by @mag1c-h in #598
[fix]cherry-pick clean code and set local_rank_size to tp_size (#596) by @qyh111 in #600
[misc] split dependency preparation logic into individual dependency files for enhanced configuration flexibility by @mag1c-h in #597
[fix]fix clean code (#601) by @qyh111 in #602
Modify blend and rerope docs by @xinSky00 in #593
[docs] Modify blend introduction by @wuhuxiao in #605
add qiongwu as codeownner by @Infinite666 in #610
KVComp in NPU -- HBM version by @leideng in #599
[bugfix] bugfix in PCStore, cherry-pick from release by @mag1c-h in #609
[docs]Add doc for pipeline store by @qyh111 in #607
[fix] remove request_succeed_dumped_blocks() in monkey patch by @xinSky00 in #613
[fix]Sync changes from the release branch to develop. including docs、version and dockerfile by @qyh111 in #621
[feat] Cherry-pick updates from 0.2.0-release to develop (patches and docs) by @wangwenxin0312 in #623
[bugfix] ] Cherry-pick updates from 0.2.0-release (hamming compile) by @wangwenxin0312 in #625
[doc]rename pipline_store to pipeline_store by @qyh111 in #626
[bugfix] fix register_kv_caches patch by @Clarence-1103 in #629
Unify xSA name as GSA by @leideng in #631
[Feature] 3FS Store by @UESTC-AHao in #622
[optimize]Optimized LLMPerf Test Cases by @Potterluo in #634
[Doc] 3FS Document by @UESTC-AHao in #637
[Feat] Basic scripts for deployment best practices by @sumingZero in #556
[feature]Add LLM connection base components and OpenAI connector by @Potterluo in #636
[Bugfix] Fix 3FS by @UESTC-AHao in #650
[feat] PipelineStore Architecture Refresh and Capability Enhancement by @mag1c-h in #653
[doc] Add contributing guide by @yuanzhg078 in #648
[doc]Implement the function of a kv cache calculator html in User Guide by @Potterluo in #652
[Opt] New gsa config by @leideng in #646
[Feat] Support C++/Python to use same metrics singleton within a process by @flesher0813 in #654
[feat]Add Layerwise Connector by @qyh111 in #656
[Fix] Modify ucm_connector to adapt metrics by @flesher0813 in #658
[doc] Update quickstart section in README_zh by @yuanzhg078 in #663
[Feat] Update sparse method patches for vllm 0.11.0 by @AooooooA-C in #638
[CI] add pr gate workflow by @dante159753 in #662
[Opt] Gsa npu performance optimize by @leideng in #647
[misc] Reduce gpu utilization to 6GB in test for 1.5B model by @dante159753 in #665
[feat] add monkey patch for gsa on device v0.9.2 by @Clarence-1103 in #618
[Fix] coredump if add new c++ metrics by @flesher0813 in #666
[opt] adapt cache blend for store and sparse's new version by @wuhuxiao in #664
[Doc] Update documents related to sparse. by @AooooooA-C in #672
[CI] use requirements file to prepare test env by @dante159753 in #673
[test]Evaluate model performance and accuracy with UCM by @ayaka836 in #642
[Fix] Failed to start vLLM service using multi-node launch scripts under CUDA data parallelism by @sumingZero in #670
[CI] remove logger, check branch up-to-date, fast fail e2e test by @dante159753 in #674
release 0.3.0 by @flesher0813 in #677
[bugfix] Fix compilation error due to missing atomic include by @harrisonyhq in #693
[Bugfix] Modify worker_id set to separate different worker by @flesher0813 in #691
[bugfix] rm unavailable lib and fix doc and update patch by @wuhuxiao in #700
[perf] Reduce directory lock conflicts during batch dumps in PosixStore by @mag1c-h in #711

New Contributors

@Infinite666 made their first contribution in #610
@dante159753 made their first contribution in #662
@ayaka836 made their first contribution in #642

Full Changelog: v0.2.0...v0.3.0

Contributors

leideng, dante159753, and 15 other contributors

Assets 2

05 Jan 12:28

qyh111

v0.2.0

39d46c7

v0.2.0

Hightlights

Support Model Window Extrapolation:Rectified Rotary Position Embeddings (ReRoPE)(#497)
Support sparse attention algorithms in HBM on both CUDA GPUs and Ascend NPUs. It sparsifies attention by hashing KV states and using Hamming distance Top-K selection.(#559)
Add Pipeline Store composed of Cache Store and POSIX Store(#553).
Improved KV cache transfer performance for NfsStore.(#393)

Known Issues

Sparse is not supported when installing via pip
- Currently, installing with pip install uc-manager does not support Sparse.
- Before installing via pip, please make sure to set the platform explicitly:
```
export PLATFORM=xxx
```
- To use Sparse, please install via the Docker image or build from source.

What's Changed

[Feature] Add performance and evaluation testing tools using the pytest framework by @zzycode1005 in #462
[Feature] Added environment pre-check by @Menglths in #498
[docs] fix links in docs and add clarifications (#499) by @Lijiachen1018 in #502
[build] rewrite setup.py by @ygwpz in #501
[bugfix] Adapt the patch to support YAML sections. by @wangwenxin0312 in #480
[bugfix] fix pip install -e no so by @ygwpz in #508
[Feature] Cache Blend by @wuhuxiao in #467
merge Feature_store_next to develop by @qyh111 in #518
[bugfix]fix setup.py by @qyh111 in #520
[bugfix]fix setup.py (#520) by @qyh111 in #521
feat(test): Add PostgreSQL support and optimize database write logic by @Potterluo in #507
[fix] move init to intergration/vllm directory by @Lijiachen1018 in #535
[Fix]Add PLATFORM reminder by @zhou-haitao in #526
cherry-pick from 0.1.0-release by @Lijiachen1018 in #552
[Feat] New Store Impl: CacheStore - PosixStore - PipelineStore by @mag1c-h in #553
[Perf] parallel block-existence checks + timeout exception by @mag1c-h in #550
[feat] Shard block files into subdirs by hash prefix, with opt-out switch by @mag1c-h in #561
[feat]use numpy to calculate addrs by @qyh111 in #564
[Bugfix] use-after-free in LookupBatch by @mag1c-h in #565
[Bugfix] skip fresh shm files to avoid race between multiple instances by @mag1c-h in #566
[Bugfix] Fix incorrect fallback in GetHostBuffer: use MakeHostBuffer instead of MakeDeviceBuffer by @mag1c-h in #568
[feat] kvcomp on device by @wangwenxin0312 in #559
[fix]Add exception handling by @qyh111 in #569
[bugfix]Fix KeyError when VLLM_HASH_ATTENTION environment variable is not set by @qyh111 in #570
[bugfix] patch update by @wangwenxin0312 in #571
[fix]fix monitor issue by @qyh111 in #572
[bugfix] build hamming dist by @wangwenxin0312 in #578
[feat] Update data file layout to adapt to garbage collection by @qyh111 in #576
[bugfix] sparse patch & cmake by @wangwenxin0312 in #580
[build]fix spdlog use ext fmt by @Lijiachen1018 in #585
[bugfix] kvcomp fix by @wangwenxin0312 in #586
[feat] KvCompOnDevice: per-KV-head Top-K for Qwen by @wangwenxin0312 in #588
[bugfix] share buffer used out by @mag1c-h in #592
[bugfix] kvcomp for qwen by @wangwenxin0312 in #595
[fix]clean code and set local_rank_size to tp_size by @qyh111 in #596
[fix]fix clean code by @qyh111 in #601
[Bugfix] update block dir permission & double-free fix by @mag1c-h in #603
[bugfix] double-release shared-block while make reader failed by @mag1c-h in #604
[docs]add doc for pipeline store by @qyh111 in #612
[feat] cherry-pick to 0.2.0-release to add rerope by @xinSky00 in #614
fix ascend patch and change version by @qyh111 in #615
add patch in dokerfile-npu by @qyh111 in #617
[feat] cherry-pick KVComp in NPU -- HBM version into the 0.2.0-release branch by @wangwenxin0312 in #619
[feat] update all patch and docs by @wangwenxin0312 in #620
[bugfix] hamming compile by @wangwenxin0312 in #624

New Contributors

@zzycode1005 made their first contribution in #462

Full Changelog: v0.1.2...v0.2.0

Contributors

mag1c-h, Lijiachen1018, and 9 other contributors

Assets 2

13 Dec 13:43

qyh111

v0.2.0rc1

bad9354

v0.2.0rc1 Pre-release

Pre-release

Hightlights

Improved Prefix Cache offload/load performance.
Support Cache Blend.

Core:

Support Cache Blend in (#467)
Add V1 Store Interface in (#510) and (#518)

Known Issues

When using the Ascend platform:
- Broadcasting is not supported.
- load_only_first_rank must be set to false in the configuration.
When compiling from source, make sure to set the PLATFORM environment variable.

What's Changed

[Feature] Add performance and evaluation testing tools using the pytest framework by @zzycode1005 in #462
[Feature] Added environment pre-check by @Menglths in #498
[docs] fix links in docs and add clarifications (#499) by @Lijiachen1018 in #502
[build] rewrite setup.py by @ygwpz in #501
[bugfix] Adapt the patch to support YAML sections. by @wangwenxin0312 in #480
[bugfix] fix pip install -e no so by @ygwpz in #508
[Feature] Cache Blend by @wuhuxiao in #467
merge Feature_store_next to develop by @qyh111 in #518
[bugfix]fix setup.py by @qyh111 in #520

New Contributors

@zzycode1005 made their first contribution in #462
@wuhuxiao made their first contribution in #467

Full Changelog: v0.1.2...v0.2.0rc1

Contributors

Lijiachen1018, ygwpz, and 5 other contributors

Assets 2

10 Dec 07:56

Lijiachen1018

v0.1.2

aa31619

v0.1.2

Some small fixes in this release.

[Docs] Documents are now easier to read.
[Docs] PD disaggregation documentation update : Update the PD disaggregation documentation to remove the --enforce-eager argument when starting the vllm service, so that graph mode is enabled by default at startup.
[Feat] Completely remove UCconnector, please use UCMConnector from now on.
[Feat] UCM supports recovery form load failure：Implement the get_block_ids_with_load_errors interface in the KVConnectorBase_V1 class, enabling vLLM to reexecute inference for requests whose KV cache failed to load from UCM.
[Build] Use pip install uc-manager==0.1.2 and the install will build from source for both vllm and vllm-ascend.
[Build] Sparse module are now built and used only if set environment variable export ENABLE_SPARSE=TRUE.

What's Changed

[cleancode]rm video by @Lijiachen1018 in #459
[fix] pick fixes from Release to develop by @Lijiachen1018 in #465
[cleancode]remove uc connector by @Lijiachen1018 in #460
[build] project docs for pypi by @Lijiachen1018 in #466
[build]build sparse only if enabled by @Lijiachen1018 in #470
[Misc] fetch dependence from gitcode as backup by @mag1c-h in #469
[docs] renew docs by @Lijiachen1018 in #476
release v0.1.1 by @Lijiachen1018 in #478
feat: add MetaX MACA device support for PC by @simshi in #387
[Docs] PD disaggregation documentation update by @sumingZero in #479
[Feat] UCM supports recovery form load failure by @sumingZero in #477
[feat]Add configurable scattergatter by @qyh111 in #483
[bugfix]add synchronize on ascend platform by @qyh111 in #485
[build] fix build by source distribution by @Lijiachen1018 in #484
release v0.1.2 by @Lijiachen1018 in #491
develop merge into main by @ygwpz in #492
[docs] fix links in docs and add clarifications by @Lijiachen1018 in #499

New Contributors

@simshi made their first contribution in #387

Full Changelog: v0.1.0...v0.1.2

Contributors

simshi, mag1c-h, and 4 other contributors

Assets 2

02 Dec 08:42

Lijiachen1018

v0.1.0

5ba2684

v0.1.0

We are excited to announce the first official release of Unified Cache Manager.

Hightlights

Offload Prefix Cache to storage.
Homogeneous/ Heterogeneos PD disaggregation.
Training-Free sparsity in accelerating inference.（vllm==0.9.2, vllm-ascend==0.9.2rc1）in #199, #335, #190, #451

Core:

Garbage collection for store in #315 and #312
Adapt to vllm and vllm-ascend in #13, #292, #415 and #362
UCM supports metrics display online via Grafana and Promethues in #414 and docs in #416

Known Issues

If using Ascend platform, please be mind of

not compatible with broadcast
load_only_first_rank: false in config

Others

Update documents
Tools for performance tuning, hyperparameter optimization in #418

What's Changed

[opt] Share Infra implementation and unify status codes by @mag1c-h in #399
[bugfix] Fix ESA to be compatible with the latest NFSStore. by @wangwenxin0312 in #401
release v0.1.0rc4 by @Lijiachen1018 in #402
[opt] Remove unused cc impl of dramstore by @mag1c-h in #406
[Fix]remove dram docs and modify quick-start doc by @hero0307 in #411
[Feature] Added performance testing tool based on the PyTest testing framework by @Menglths in #295
[Misc] Add cpp-linter.yml by @mag1c-h in #422
[docs]add metrics doc by @hero0307 in #416
[perf] Modify CUDA SIMD and add Triton hash encoder by @Clarence-1103 in #408
[bugfix] batch trans on cuda with SM return 700 error by @mag1c-h in #434
[Misc] set default logger backend to spdlog by @mag1c-h in #440
[rebase]Dev-ucm-v1 rebase to develop by @Lijiachen1018 in #453
[cleancode] remove dramstore by @Lijiachen1018 in #455
Fix metrics by @Lijiachen1018 in #456

New Contributors

@Menglths made their first contribution in #295

Full Changelog: v0.1.0rc4...v0.1.0

Contributors

mag1c-h, Lijiachen1018, and 4 other contributors

Assets 4

22 Nov 10:16

Lijiachen1018

v0.1.0rc4

5779ce9

v0.1.0rc4 Pre-release

Pre-release

What's Changed

[feat] ucmtrans: Unify API for Device-Host Memory Transfers by @mag1c-h in #379
[feat] Add support for Ascend device memory transfers by @mag1c-h in #382
[Fix] fix build, fix no save kv layer by @Lijiachen1018 in #390
[feat] Add pcstore for enhanced PrefixCache performance by @FangRun2 in #393
[fix] fix ascend attention by @Lijiachen1018 in #394
release v0.1.0rc3 by @Lijiachen1018 in #395
[fix] fix sparse attention by @Lijiachen1018 in #397

New Contributors

@FangRun2 made their first contribution in #393

Full Changelog: v0.1.0rc2...v0.1.0rc4

Contributors

mag1c-h, Lijiachen1018, and FangRun2

Assets 3

19 Nov 08:01

Lijiachen1018

v0.1.0rc2

16ed5da

v0.1.0rc2 Pre-release

Pre-release

What's Changed

[docs] update docs for v0.1.0rc1 by @Lijiachen1018 in #365
[bug fix] Dev patch fix for sparse by @Lijiachen1018 in #371
[build] auto patch for ascend by @Lijiachen1018 in #372
feat: add Mthreads MUSA device support -stage 1 by @superleo in #370
release v0.1.0rc2 by @Lijiachen1018 in #373
prefetch bug by @zbb200819 in #360
[Feat]Adapt to vllm-ascend0.9.1 and vllm-ascend0.11.0 by @hero0307 in #362
[bugfix] add cmake option to bypass NUMA binding by @Clarence-1103 in #368
[Feat] Update the data items saved by trace replay by @sumingZero in #366

New Contributors

@superleo made their first contribution in #370

Full Changelog: v0.1.0rc1...v0.1.0rc2

Contributors

superleo, zbb200819, and 4 other contributors

Assets 3

17 Nov 12:21

Lijiachen1018

v0.1.0rc1

754f7ba

v0.1.0rc1 Pre-release

Pre-release

Support Features

Prefix Cache
Sparse Attention
Sparse Attention Offload
PD Disaggregation

What's Changed

remove impl by @flesher0813 in #11
adapt vllm v0.9.2 by @flesher0813 in #13
[Doc] Outline of the document by @ygwpz in #15
remove impl test and add uc connector test by @flesher0813 in #14
[Doc] Installation of ucm by @flesher0813 in #17
[Feature] Add DRAM Connector for uc_connector by @harrisonyhq in #18
[doc] add readme and license by @ygwpz in #24
[Feature] Add Dockerfiles by @flesher0813 in #20
[Feature]Nfsstore by @propanone1006 in #23
[doc] change docs outline by @ygwpz in #32
[Feature] Add Cmake build command in setup.py by @harrisonyhq in #34
[fixbug] fix issue#25 issue#31 and issue#33 by @flesher0813 in #30
[Fix][Docs] Make example runnable and add performance data (closes #37 #29 #42) by @harrisonyhq in #41
[Feat] Move kv_block_size to config by @harrisonyhq in #43
[feature][docs]finish nfs store and add docs by @qyh111 in #44
[doc] Add export of device type in installation;[Fix] fix version invalid#45 #46 by @harrisonyhq in #47
add perf data in readme by @ygwpz in #49
[Feat] Merge 0.0.1 back into develop by @flesher0813 in #50
[bugfix] fix issue#26 and issue#36 by @ygwpz in #55
[Doc] Add vllm institution by @flesher0813 in #61
[CI][Fix] update issue and pr template, fix issue #57, cherry-pick main by @flesher0813 in #65
[Doc] update install doc using patch to build from source code by @flesher0813 in #68
[Feat] Merge 0.0.1 back into develop by @ygwpz in #72
[Style] Fix codestyle problems and typo in develop by @harrisonyhq in #75
[Feature] add ucm_sparse v1.0: unified sparse attention algorithm framework by @hek14 in #79
[Fix] Fix cant find cmake error when using pip install -e . by @harrisonyhq in #80
Revert "[Feature] add ucm_sparse v1.0: unified sparse attention algorithm framework " by @ygwpz in #82
[Feature] add Mooncake Store by @propanone1006 in #86
[Fix bug] Simplify docker build and installation.md by @flesher0813 in #87
[BUG]adapt deepseek by @qyh111 in #89
[Feature][P/D] add example for disaggregated prefill by @flesher0813 in #90
[Perf] Pipelined ucmnfsstore by @mag1c-h in #97
Revert "[Feature] add Mooncake Store" by @ygwpz in #98
[Fix bug] fix uc_connector ut and change hash generation method by @hero0307 in #101
[Fix] Fix .so build error by @harrisonyhq in #104
[Fix] Fix ascend compile error by @mag1c-h in #106
[Perf]Modify start_load_kv by @qyh111 in #103
[Fix] Fix duplicate create/commit errors upon preemption by @flesher0813 in #109
[Feat] Adapt for vllm 0.9.1 by @sumingZero in #113
[Feature] [Doc] UCMSparse framework by @hek14 in #112
[fix] remove redundant code and files/rename file names by @NaganooMei in #118
[Fix] Fix spelling issues with PR templates by @propanone1006 in #119
remove load_tasks by @NaganooMei in #121
[bugfix] bugfix in ucmnfsstore by @mag1c-h in #123
[doc]Add config parameter by @UESTC-AHao in #130
[bugfix]fix rank handing in multi-node pp setup by @qyh111 in #129
[Feat]Support UCM Sparse on cuda by @harrisonyhq in #126
[Feature] Add mooncake store by @hufumans in #117
[bugfix]modify mla dump by @zhou-haitao in #128
[feature] non-blocking interfaces are provided to check whether the transmission task is completed by @mag1c-h in #139
[feature] return error if block exists while batch creation. by @mag1c-h in #138
[feature]modify create interface by @hufumans in #145
[Doc] change logo and rearange docs by @flesher0813 in #156
0.0.2 release merge develop by @ygwpz in #158
[doc][feature] change code directory by @ygwpz in #161
[fix] modify patch and workflow by @NaganooMei in #163
[Feat] Support load async by @flesher0813 in #166
[Feat]Support load async and load failure by @flesher0813 in #165
[Feature]refactor ucconnector by @qyh111 in #167
[feature] upload retake codes by @truthstriver in #172
[bugfix]Resolve the issue of the first-round commit failure under dsv2 by @zhou-haitao in #186
[Feat] Add KVComp sparse attention implementation in UCM by @leideng in #182
[perf]prepare offset in advance by @qyh111 in #188
[feature] GSA by @HaoLi980405 in #190
[bugfix]fix pp problem and remove err logs when duplicate create by @qyh111 in #191
[Fix] Fix bug: check task returns -50005 during async load by @sumingZero in #192
[bugfix]gsa fix reslotmapping bug by @HaoLi980405 in #194
[bugfix]gsa fix running reqs exceed 30 bug by @HaoLi980405 in #195
[doc] design doc directory by @ygwpz in #197
[Perf]kv_block_size as well as transferIoSize are calculated rather than configured by @UESTC-AHao in #196
[Feat] add cuda topk and gsa descriptions by @HaoLi980405 in #198
[Fix] Fix workflow image space error in action by @harrisonyhq in #203
[bugfix]roll back dataoffset by @qyh111 in #201
[bugfix] fix whl install gsa error and gsa kpre reslotmapping out of range by @HaoLi980405 in #204
[Fix][Doc] Modify sparse docs by @flesher0813 in https://gi...