dev-ucm-v1 merge main #452

ygwpz · 2025-12-01T14:41:39Z

Purpose

What this PR does / why we need it?

Modifications

Does this PR introduce any user-facing change?

Test

How was this patch tested?

* [Doc] update document link * [Doc] add the README.md of TraceReplay

add code owners

* Improve the quick_start.md * Add a Note

* Fix the layer_size calculated wrong for mla * Fix the style

* add store intf with tensor addr ptr * fix interface doxy

#296)

v091_patch add commit

…r` with CUDA (#305) add batch interface for device ops and implement ScatterGather with CUDA

hotness management for gc Co-authored-by: lijiachen <[email protected]>

* fix mtp in ucm

…322) * linear buffer for device * check data consistency after embedding

* capacity check * recycle --------- Co-authored-by: lijiachen <[email protected]>

* refactor: reusable transport abstraction & optimized NSFStore pipeline * add memory pool * rewrite dramstore (1st version) * fix * fix * fix * fix * fix * add comment for later development * fix * fix * fix * fix * fix * complete dramstore_connector.py * fix * fix CMakeLists.txt * fix * fix * fix * fix * rewrite dram test:dump (1/2) * format * fix * rewrite dram test:fetch (2/2) * modify dram test script * fix * fix * naive try: add device intf into memory pool * fix * fix * fix * fix * fix * fix * fix * fix * fix * fix * fix * fix * fix * fix * fix * fix * fix * fix * fix * fix * update memPool intf * fix * fix * fix * try reconstruct threadPool in dramstore (naive version) * fix * fix * fix * fix * fix * fix * fix * fix * fix * fix * fix * fix * fix * fix * fix * fix * remove redundant comments --------- Co-authored-by: Mag1c.H <[email protected]>

* [Feat] Toy proxy now supports PD-mixed round-robin scheduling * [Docs] modify the path of toy_proxy_server.py

#309) [Feat]Add import checking to trace replay and fix the issue of unclosed network resources Co-authored-by: t00939662 <[email protected]>

fix gc Co-authored-by: lijiachen <[email protected]>

Add bandwidth testing script

) * handle preempt in ESA and add init_window

* delta kvcache block swap * clean code * add core bind method * clean code

reuse act Co-authored-by: lijiachen <[email protected]>

Co-authored-by: hek14 <[email protected]>

adapt to deepseek

* [Feat] Support launch from config file * [Docs] Update documents for launch with yaml * [Fix] Change load only on first rank into configuration * [Feat] Add support for hit ratio in yaml * [Fix] Fix load only first rank in non mla scene

refuse monkey patch

fix gqa bug

fix end == 0 bug

optimize generate_tensor

* adapt GQA & modify config.yaml * move process to UCMDirectConnector * fix comment * modify hash function * fix style * code style and modify hash * init parent_block_hash_value

* Adapt vllm_ascend_0110 and Add configurable options * avoid type conversion in init kvcache

seprate spase patch Co-authored-by: lijiachen19 <[email protected]>

Support tensor parallelism across servers

…#414) * [Feat] Build metrics frame * [Feat]add metrics(ucm_obser.py + metrics_configs.yaml) * [Feat] Implementation of metrics logger on the C++ side for storing and retrieving stats * [Fix] Provide simple grafana and fix bugs * [feat] change the log position of UCM metrics * [fix]modify grafana.json * [Feat] UCM supports metrics display online via Grafana and Promethues * [Fix] Remove configs to examples and add liscense --------- Co-authored-by: flesher0813 <[email protected]> Co-authored-by: hero<[email protected]>

* [fix] fix sparse attention (#397) fix ascend attention Co-authored-by: lijiachen19 <[email protected]> * [opt] Share Infra implementation and unify status codes (#399) share infra module Co-authored-by: Fang Run <[email protected]> * [bugfix] Fix ESA to be compatible with the latest NFSStore. (#401) fix esa to adapt latest NFSStore * release v0.1.0rc4 (#402) Co-authored-by: lijiachen19 <[email protected]> * [opt] Remove unused cc impl of dramstore (#406) remove unused cc impl of dramstore * [Fix]remove dram docs and modify quick-start doc (#411) * [Fix]remove dram docs and modify quick-start doc * modify index.md --------- Co-authored-by: t00939662 <[email protected]> * [Feature] Added performance testing tool based on the PyTest testing framework (#295) Performance testing tool based on the PyTest testing framework. * [Misc] Add cpp-linter.yml (#422) * [docs]add metrics doc (#416) * [docs]add metrics doc * modify metrics.md * modify metrics.md --------- Co-authored-by: t00939662 <[email protected]> * [perf] Modify CUDA SIMD and add Triton hash encoder (#408) * fix cpp code style --------- Co-authored-by: Lijiachen1018 <[email protected]> Co-authored-by: lijiachen19 <[email protected]> Co-authored-by: Mag1c.H <[email protected]> Co-authored-by: Fang Run <[email protected]> Co-authored-by: MaxWang <[email protected]> Co-authored-by: hero0307 <[email protected]> Co-authored-by: t00939662 <[email protected]> Co-authored-by: ML <[email protected]> Co-authored-by: ShiXiaolei <[email protected]>

Co-authored-by: lijiachen19 <[email protected]>

Fix(patch): fix patch for vllm-ascend volcengine/verl#2564 Co-authored-by: lijiachen19 <[email protected]>

cuda trans batch api bug fix (cherry picked from commit 77f5090)

* fix accuracy problem when chunked prefill

set default logger backend to spdlog

* fix num_schedule-tokens=1 * Simplify the code

Fix sparse patch Co-authored-by: lijiachen <[email protected]>

)

* Add an access bandwidth test script for 'ucm_connector'

adapt vllm0.9.1

#447) Set the multiprocessing start method of the test tool to 'spawn' and add NPU cleanup

* sparse to adapt new connector * Adapt the YAML configuration

renew docs for v1 Co-authored-by: lijiachen19 <[email protected]>

* adapt nfsstore

sumingZero and others added 30 commits October 9, 2025 11:01

[Doc] update document link (#270)

b05d8e7

* [Doc] update document link * [Doc] add the README.md of TraceReplay

[Misc] add code owners (#274)

b271703

add code owners

[Docs]Improve the quick_start.md (#275)

477dc28

* Improve the quick_start.md * Add a Note

[bugfix][#280] MLA layer size calculated wrong (#281)

99b09be

* Fix the layer_size calculated wrong for mla * Fix the style

[Fix] Each request in the decode instance encounters a load failure

1f0f228

[Misc] add store intf with tensor addr ptr (#288)

d9b68aa

* add store intf with tensor addr ptr * fix interface doxy

refactor: reusable transport abstraction & optimized NSFStore pipeline (

c578037

#296)

[Docs] Modify Readme Contact Us (#298)

8362969

[Fix] Fix gpu_model_runner req_state update error for issue 283

d2f3d9a

[Feature]v091_patch add commit (#302)

cb0a0f5

v091_patch add commit

[Feat] Adapt Trace Replay to vLLM >= 0.10.2 (#303)

6ab7167

clean code log print

2a82141

new space shard layout with temp dir

f7c3569

fix: only delete activated dir when it differs from archived dir

b53b23a

[Feat] add batch interface for device ops and implement `ScatterGathe…

7c8c9a3

…r` with CUDA (#305) add batch interface for device ops and implement ScatterGather with CUDA

[feat] hotness management for gc (#312)

c4eb386

hotness management for gc Co-authored-by: lijiachen <[email protected]>

Fix Cuda compilation (#317)

b1b7be5

[BugFix]fix mtp in ucm (#321)

828bbbd

* fix mtp in ucm

[bugfix] preserve DRAM buffer lifetime to restore inference accuracy (#…

9708eee

…322) * linear buffer for device * check data consistency after embedding

[feat] capacity check for nfsstore (#315)

7004ab7

* capacity check * recycle --------- Co-authored-by: lijiachen <[email protected]>

[Feat] Toy proxy now supports PD-mixed round-robin scheduling (#316)

b01501a

* [Feat] Toy proxy now supports PD-mixed round-robin scheduling * [Docs] modify the path of toy_proxy_server.py

[Fix]Add import checking to trace_replay and fix the issue of unclose… (

022e187

#309) [Feat]Add import checking to trace replay and fix the issue of unclosed network resources Co-authored-by: t00939662 <[email protected]>

[bug fix] fix recycleNum when less than 1 (#327)

6512503

fix gc Co-authored-by: lijiachen <[email protected]>

[Feat]Add nfsstore bandwidth testing script (#323)

8622635

Add bandwidth testing script

Fix preemption for sparse attention module and add attention sink. (#333

2ec56df

) * handle preempt in ESA and add init_window

[enhance]optimize kvstar core bind method & delta kvcache swap (#330)

29be755

* delta kvcache block swap * clean code * add core bind method * clean code

[feat] Re-use active block (#334)

b0412d8

reuse act Co-authored-by: lijiachen <[email protected]>

add heke as CODEOWNERS of /docs and /integration (#336)

51ba639

Co-authored-by: hek14 <[email protected]>

Adapt ESA to support DeepSeek. (#335)

10a2eec

adapt to deepseek

harrisonyhq and others added 28 commits November 22, 2025 17:30

[fix] refuse monkey patch (#383)

92bacb8

refuse monkey patch

[bugfix] fix gqa bug (#384)

a3f049d

fix gqa bug

[bugfix] end == 0 bug (#385)

66e3e18

fix end == 0 bug

[feature] optimize generate_tensor (#396)

63c916b

optimize generate_tensor

[Fix] fix mla bug when no broadcast in wait for save (#398)

6358406

[feat]adapt GQA & modify config.yaml (#407)

0986b89

* adapt GQA & modify config.yaml * move process to UCMDirectConnector * fix comment * modify hash function * fix style * code style and modify hash * init parent_block_hash_value

[feat]Adapt vllm_ascend_0110 and Add configurable options (#415)

5403998

* Adapt vllm_ascend_0110 and Add configurable options * avoid type conversion in init kvcache

[patch]seprate sparse patch (#417)

4cb08ad

seprate spase patch Co-authored-by: lijiachen19 <[email protected]>

[bugfix]Support tensor parallelism across servers (#420)

978a01b

Support tensor parallelism across servers

add env variable ENABLE_SPARSE (#430)

8441e91

Co-authored-by: lijiachen19 <[email protected]>

Fix(patch): fix patch for vllm-ascend (#433)

2daba37

Fix(patch): fix patch for vllm-ascend volcengine/verl#2564 Co-authored-by: lijiachen19 <[email protected]>

[bugfix] batch trans on cuda with SM return 700 error (#434)

9e6a315

cuda trans batch api bug fix (cherry picked from commit 77f5090)

[bugfix] fix accuracy problem when chunked prefill (#438)

6db8f23

* fix accuracy problem when chunked prefill

[Misc] set default logger backend to spdlog (#440)

42a5ab5

set default logger backend to spdlog

[bugfix]fix num_schedule-tokens=1 (#442)

cfa0ae0

* fix num_schedule-tokens=1 * Simplify the code

[fix]: Fix sparse patch (#444)

b6a21fd

Fix sparse patch Co-authored-by: lijiachen <[email protected]>

[bugfix] The Metrics module uses a non-existent variable self.rank (#445

86c7ca0

)

[Feature]Add an access bandwidth test script for ucm_connector (#418)

2663929

* Add an access bandwidth test script for 'ucm_connector'

[bugfix]adapt vllm0.9.1 (#446)

d613e22

adapt vllm0.9.1

[Fix]Set the multiprocessing start method of the test tool to 'spawn'. (

b36dfdb

#447) Set the multiprocessing start method of the test tool to 'spawn' and add NPU cleanup

[fix] Adapt all sparse-attention methods to the new connector. (#441)

aff412a

* sparse to adapt new connector * Adapt the YAML configuration

[docs] renew docs for v1 (#448)

4d784a3

renew docs for v1 Co-authored-by: lijiachen19 <[email protected]>

set version to 0.1.0 (#450)

2bdba86

[Feature] GSA adapt nfsStore (#451)

aa759d6

* adapt nfsstore

fix codestyle

477a742

ygwpz closed this Dec 2, 2025

ygwpz deleted the dev-ucm-v1 branch December 4, 2025 03:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

dev-ucm-v1 merge main #452

dev-ucm-v1 merge main #452

Uh oh!

ygwpz commented Dec 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

dev-ucm-v1 merge main #452

dev-ucm-v1 merge main #452

Uh oh!

Conversation

ygwpz commented Dec 1, 2025

Purpose

Modifications

Test

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants