Skip to content

[Performance]: Online service test: Sending the same request twice consecutively does not generate BLOCK files, but the prefix cache hit rate will change #475

@soilderone

Description

@soilderone

Proposal to improve performance

No response

Report of performance regression

I am testing vllm online service on 910b3 with vllm-ascend 0.9.2 rc1 and ucm v0.1.0. When I conduct offline tests, BLOCK files can be generated.But when I conduct online tests, BLOCK files can not be generated, but the prefix cache hit rate changed.

The following is my startup command

ASCEND_RT_VISIBLE_DEVICES=6 vllm serve /app/model/Qwen2.5-14B-Instruct \
	--served-model-name vllm_nfs_offload \
	--max-model-len 4096 \
	--tensor-parallel-size 1 \
	--gpu-memory-utilization 0.8 \
	--trust-remote-code \
	--port 5006 \
	--kv-transfer-config '{
    "kv_connector": "UCMConnector",
    "kv_connector_module_path": "ucm.integration.vllm.ucm_connector",
    "kv_role": "kv_both",
    "kv_connector_extra_config": {
      "ucm_connectors": [
        {
          "ucm_connector_name": "UcmNfsStore",
          "ucm_connector_config": {
            "storage_backends": "/app/storage",
            "use_direct": false
          }
        }
      ],
      "ucm_sparse_config": {
        "KVStarMultiStep": {
          "init_window_sz": 1,
          "local_window_sz": 2,
          "sparse_ratio": 0.25,
          "retrieval_stride": 8,
          "blk_repre_dim_prune_ratio": 0.25,
          "blk_repre_inner_token_merge": 2
        }
      }
    }
  }'

The following is part of the log output of the first request

Image

The following is part of the log output of the second request

Image

The content under storage_backends:
Image

Misc discussion on performance

No response

Your current environment (if you think it is necessary)

Image

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions