[Performance]: Online service test: Sending the same request twice consecutively does not generate BLOCK files, but the prefix cache hit rate will change

### Proposal to improve performance

_No response_

### Report of performance regression

I am testing vllm online service on 910b3 with vllm-ascend 0.9.2 rc1 and ucm v0.1.0. When I conduct offline tests, BLOCK files can be generated.But when I conduct online tests, BLOCK files can not be generated, but the prefix cache hit rate changed.

**_The following is my startup command_**：
```
ASCEND_RT_VISIBLE_DEVICES=6 vllm serve /app/model/Qwen2.5-14B-Instruct \
	--served-model-name vllm_nfs_offload \
	--max-model-len 4096 \
	--tensor-parallel-size 1 \
	--gpu-memory-utilization 0.8 \
	--trust-remote-code \
	--port 5006 \
	--kv-transfer-config '{
    "kv_connector": "UCMConnector",
    "kv_connector_module_path": "ucm.integration.vllm.ucm_connector",
    "kv_role": "kv_both",
    "kv_connector_extra_config": {
      "ucm_connectors": [
        {
          "ucm_connector_name": "UcmNfsStore",
          "ucm_connector_config": {
            "storage_backends": "/app/storage",
            "use_direct": false
          }
        }
      ],
      "ucm_sparse_config": {
        "KVStarMultiStep": {
          "init_window_sz": 1,
          "local_window_sz": 2,
          "sparse_ratio": 0.25,
          "retrieval_stride": 8,
          "blk_repre_dim_prune_ratio": 0.25,
          "blk_repre_inner_token_merge": 2
        }
      }
    }
  }'
```

**_The following is part of the log output of the first request_**：

<img width="822" height="56" alt="Image" src="https://github.com/user-attachments/assets/29058099-7af2-4a0e-907b-4974ec1d39e0" />

**_The following is part of the log output of the second request_**：

<img width="824" height="59" alt="Image" src="https://github.com/user-attachments/assets/7761cd41-a3a1-4fb8-914c-d08819474b98" />

**_The content under storage_backends：_**
<img width="254" height="37" alt="Image" src="https://github.com/user-attachments/assets/5f9fd1a1-7a73-4516-9f77-8804e25d0c80" />

### Misc discussion on performance

_No response_

### Your current environment (if you think it is necessary)

<img width="461" height="473" alt="Image" src="https://github.com/user-attachments/assets/6e86b935-81ce-4dc7-8179-29a4b136c993" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Performance]: Online service test: Sending the same request twice consecutively does not generate BLOCK files, but the prefix cache hit rate will change #475

Proposal to improve performance

Report of performance regression

Misc discussion on performance

Your current environment (if you think it is necessary)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Performance]: Online service test: Sending the same request twice consecutively does not generate BLOCK files, but the prefix cache hit rate will change #475

Description

Proposal to improve performance

Report of performance regression

Misc discussion on performance

Your current environment (if you think it is necessary)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions