Skip to content

Commit e19f659

Browse files
22dimensionsclaude
andcommitted
upgrade vllm to 0323 commit id: 35141a7eeda941a60ad5a4956670c60fd5a77029
fix: add missing num_prompt_tokens_cpu_tensor to NPUInputBatch Adapt to upstream vLLM changes in InputBatch. The vLLM v1 refactored InputBatch to use torch tensors for CPU data structures with numpy views, matching the pattern used for other batch statistics. - Added num_tokens_no_spec_cpu_tensor and num_tokens_no_spec - Added num_prompt_tokens_cpu_tensor and updated num_prompt_tokens to be a numpy view - Fixes AttributeError: 'NPUInputBatch' object has no attribute 'num_prompt_tokens_cpu_tensor' Affects: All pooling model tests that access input batch metadata. Co-Authored-By: Claude Code <noreply@anthropic.com> Signed-off-by: 22dimensions <waitingwind@foxmail.com>
1 parent 114ec75 commit e19f659

File tree

7 files changed

+22
-10
lines changed

7 files changed

+22
-10
lines changed

.github/workflows/bot_pr_create.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ jobs:
3737
steps:
3838
- name: Get vLLM version
3939
run: |
40-
VLLM_COMMIT=ed359c497a728f08b5b41456c07a688ccd510fbc
40+
VLLM_COMMIT=35141a7eeda941a60ad5a4956670c60fd5a77029
4141
echo "VLLM_COMMIT=https://github.com/vllm-project/vllm/commit/$VLLM_COMMIT" >> "$GITHUB_ENV"
4242
4343
- name: Checkout repository

.github/workflows/dockerfiles/Dockerfile.lint

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ RUN apt-get update -y && \
2727

2828
ARG VLLM_REPO=https://github.com/vllm-project/vllm.git
2929
# For lint purpose, actually we need make a main2main matching.
30-
ARG VLLM_COMMIT=ed359c497a728f08b5b41456c07a688ccd510fbc
30+
ARG VLLM_COMMIT=35141a7eeda941a60ad5a4956670c60fd5a77029
3131
RUN git clone $VLLM_REPO /vllm-workspace/vllm && \
3232
cd /vllm-workspace/vllm && \
3333
git checkout $VLLM_COMMIT

.github/workflows/pr_test_full.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@ jobs:
7575
name: e2e-full
7676
strategy:
7777
matrix:
78-
vllm_version: [ed359c497a728f08b5b41456c07a688ccd510fbc, v0.18.0]
78+
vllm_version: [35141a7eeda941a60ad5a4956670c60fd5a77029, v0.18.0]
7979
needs: [changes]
8080
if: ${{ needs.changes.outputs.e2e_tracker == 'true' || needs.changes.outputs.e2e_tracker == true }}
8181
uses: ./.github/workflows/_e2e_test.yaml

.github/workflows/pr_test_light.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ jobs:
4141
lint:
4242
uses: ./.github/workflows/_pre_commit.yml
4343
with:
44-
vllm: ed359c497a728f08b5b41456c07a688ccd510fbc
44+
vllm: 35141a7eeda941a60ad5a4956670c60fd5a77029
4545
changes:
4646
runs-on: linux-aarch64-a2b3-0
4747
outputs:
@@ -90,7 +90,7 @@ jobs:
9090
if: ${{ needs.lint.result == 'success' && (needs.changes.outputs.e2e_tracker == 'true' || needs.changes.outputs.ut_tracker == 'true') }}
9191
strategy:
9292
matrix:
93-
vllm_version: [ed359c497a728f08b5b41456c07a688ccd510fbc, v0.18.0]
93+
vllm_version: [35141a7eeda941a60ad5a4956670c60fd5a77029, v0.18.0]
9494
uses: ./.github/workflows/_unit_test.yaml
9595
with:
9696
vllm: ${{ matrix.vllm_version }}
@@ -102,7 +102,7 @@ jobs:
102102
name: e2e-light
103103
strategy:
104104
matrix:
105-
vllm_version: [ed359c497a728f08b5b41456c07a688ccd510fbc, v0.18.0]
105+
vllm_version: [35141a7eeda941a60ad5a4956670c60fd5a77029, v0.18.0]
106106
# Note (yikun): If CI resource are limited we can split job into two chain jobs
107107
needs: [lint, changes]
108108
# only trigger e2e test after lint passed and the change is e2e related with pull request.

.github/workflows/schedule_codecov_refresh.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ jobs:
3333
name: refresh codecov
3434
strategy:
3535
matrix:
36-
vllm_version: [ed359c497a728f08b5b41456c07a688ccd510fbc]
36+
vllm_version: [35141a7eeda941a60ad5a4956670c60fd5a77029]
3737
uses: ./.github/workflows/_unit_test.yaml
3838
with:
3939
vllm: ${{ matrix.vllm_version }}

docs/source/community/versioning_policy.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@ For main branch of vLLM Ascend, we usually make it compatible with the latest vL
5959

6060
| vLLM Ascend | vLLM | Python | Stable CANN | PyTorch/torch_npu |
6161
|-------------|--------------|------------------|-------------|--------------------|
62-
| main | ed359c497a728f08b5b41456c07a688ccd510fbc, v0.18.0 tag | >= 3.10, < 3.12 | 8.5.0 | 2.9.0 / 2.9.0 |
62+
| main | 35141a7eeda941a60ad5a4956670c60fd5a77029, v0.18.0 tag | >= 3.10, < 3.12 | 8.5.0 | 2.9.0 / 2.9.0 |
6363

6464
## Release cadence
6565

vllm_ascend/worker/npu_input_batch.py

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -80,8 +80,20 @@ def __init__(
8080
# Maps req_index -> tensor of shape (num_prompt_tokens, hidden_size)
8181
self.req_prompt_embeds: dict[int, torch.Tensor] = {}
8282
self.num_tokens = np.zeros(max_num_reqs, dtype=np.int32)
83-
self.num_tokens_no_spec = np.zeros(max_num_reqs, dtype=np.int32)
84-
self.num_prompt_tokens = np.zeros(max_num_reqs, dtype=np.int32)
83+
self.num_tokens_no_spec_cpu_tensor = torch.zeros(
84+
(max_num_reqs,),
85+
device="cpu",
86+
dtype=torch.int32,
87+
pin_memory=pin_memory,
88+
)
89+
self.num_tokens_no_spec = self.num_tokens_no_spec_cpu_tensor.numpy()
90+
self.num_prompt_tokens_cpu_tensor = torch.zeros(
91+
(max_num_reqs,),
92+
device="cpu",
93+
dtype=torch.int32,
94+
pin_memory=pin_memory,
95+
)
96+
self.num_prompt_tokens = self.num_prompt_tokens_cpu_tensor.numpy()
8597
self.num_computed_tokens_cpu_tensor = torch.zeros(
8698
(max_num_reqs,),
8799
device="cpu",

0 commit comments

Comments
 (0)