Release v0.3.0 · ModelEngine-Group/unified-cache-management

HighLights

Refinement of PipelineStore Architecture and Enhancement of Core Capabilities #653 #711
Now supports 3FS for scalable and efficient storage backends #622
Features the new GSAOnDevice sparse attention algorithm, enabling high-performance HBM utilization across both CUDA and Ascend platforms.#647 #638
Aligned CacheBlend with the new UCM storage and sparse engine updates to support vLLM 0.9.2. #664

Known Issues

Layerwise is not supported when using vllm 0.11.0
- Currently, installing with pip install uc-manager does not support using vllm 0.11.0.
- If you need to use vLLM 0.11.0+ with UCM layerwise, please refer to vllm-project/vllm#26675 for modifications.

What's Changed

[bugfix]cherry-pick from 0.2.0release Fix KeyError by @qyh111 in #573
[bugfix] cherry-pick from 0.2.0release patch update by @wangwenxin0312 in #574
[fix]cherry pick from 0.2.0-release fix monitor issue (#572) by @qyh111 in #575
[bugfix] build hamming dist by @wangwenxin0312 in #577
[feat]Update data file layout to adapt to garbage collection by @qyh111 in #579
[bugfix]cherry pick from 0.2.0-release sparse patch & cmake by @wangwenxin0312 in #581
[bugfix] kvcomp config by @wangwenxin0312 in #584
[feat] KvCompOnDevice: per-KV-head Top-K for Qwen by @wangwenxin0312 in #589
feature for triton rerope by @xinSky00 in #497
[bugfix] kvcomp for qwen by @wangwenxin0312 in #594
[bugfix] share buffer used out (cherry-picked from #592) by @mag1c-h in #598
[fix]cherry-pick clean code and set local_rank_size to tp_size (#596) by @qyh111 in #600
[misc] split dependency preparation logic into individual dependency files for enhanced configuration flexibility by @mag1c-h in #597
[fix]fix clean code (#601) by @qyh111 in #602
Modify blend and rerope docs by @xinSky00 in #593
[docs] Modify blend introduction by @wuhuxiao in #605
add qiongwu as codeownner by @Infinite666 in #610
KVComp in NPU -- HBM version by @leideng in #599
[bugfix] bugfix in PCStore, cherry-pick from release by @mag1c-h in #609
[docs]Add doc for pipeline store by @qyh111 in #607
[fix] remove request_succeed_dumped_blocks() in monkey patch by @xinSky00 in #613
[fix]Sync changes from the release branch to develop. including docs、version and dockerfile by @qyh111 in #621
[feat] Cherry-pick updates from 0.2.0-release to develop (patches and docs) by @wangwenxin0312 in #623
[bugfix] ] Cherry-pick updates from 0.2.0-release (hamming compile) by @wangwenxin0312 in #625
[doc]rename pipline_store to pipeline_store by @qyh111 in #626
[bugfix] fix register_kv_caches patch by @Clarence-1103 in #629
Unify xSA name as GSA by @leideng in #631
[Feature] 3FS Store by @UESTC-AHao in #622
[optimize]Optimized LLMPerf Test Cases by @Potterluo in #634
[Doc] 3FS Document by @UESTC-AHao in #637
[Feat] Basic scripts for deployment best practices by @sumingZero in #556
[feature]Add LLM connection base components and OpenAI connector by @Potterluo in #636
[Bugfix] Fix 3FS by @UESTC-AHao in #650
[feat] PipelineStore Architecture Refresh and Capability Enhancement by @mag1c-h in #653
[doc] Add contributing guide by @yuanzhg078 in #648
[doc]Implement the function of a kv cache calculator html in User Guide by @Potterluo in #652
[Opt] New gsa config by @leideng in #646
[Feat] Support C++/Python to use same metrics singleton within a process by @flesher0813 in #654
[feat]Add Layerwise Connector by @qyh111 in #656
[Fix] Modify ucm_connector to adapt metrics by @flesher0813 in #658
[doc] Update quickstart section in README_zh by @yuanzhg078 in #663
[Feat] Update sparse method patches for vllm 0.11.0 by @AooooooA-C in #638
[CI] add pr gate workflow by @dante159753 in #662
[Opt] Gsa npu performance optimize by @leideng in #647
[misc] Reduce gpu utilization to 6GB in test for 1.5B model by @dante159753 in #665
[feat] add monkey patch for gsa on device v0.9.2 by @Clarence-1103 in #618
[Fix] coredump if add new c++ metrics by @flesher0813 in #666
[opt] adapt cache blend for store and sparse's new version by @wuhuxiao in #664
[Doc] Update documents related to sparse. by @AooooooA-C in #672
[CI] use requirements file to prepare test env by @dante159753 in #673
[test]Evaluate model performance and accuracy with UCM by @ayaka836 in #642
[Fix] Failed to start vLLM service using multi-node launch scripts under CUDA data parallelism by @sumingZero in #670
[CI] remove logger, check branch up-to-date, fast fail e2e test by @dante159753 in #674
release 0.3.0 by @flesher0813 in #677
[bugfix] Fix compilation error due to missing atomic include by @harrisonyhq in #693
[Bugfix] Modify worker_id set to separate different worker by @flesher0813 in #691
[bugfix] rm unavailable lib and fix doc and update patch by @wuhuxiao in #700
[perf] Reduce directory lock conflicts during batch dumps in PosixStore by @mag1c-h in #711

New Contributors

@Infinite666 made their first contribution in #610
@dante159753 made their first contribution in #662
@ayaka836 made their first contribution in #642

Full Changelog: v0.2.0...v0.3.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.3.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

HighLights

Known Issues

What's Changed

New Contributors

Contributors

Uh oh!