Skip to content

Commit d4d9ee2

Browse files
committed
refact model runner
Signed-off-by: weiguihua2 <[email protected]>
1 parent 3cca936 commit d4d9ee2

File tree

1 file changed

+0
-26
lines changed

1 file changed

+0
-26
lines changed

vllm_ascend/attention/utils.py

Lines changed: 0 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -53,32 +53,6 @@ class AscendCommonAttentionMetadata:
5353
graph_pad_size: int = -1
5454

5555

56-
@dataclass
57-
class TorchairCommonAttentionMetadata:
58-
"""
59-
Per-batch attention metadata, shared across layers and backends.
60-
AttentionMetadataBuilder instances use it to construct per-layer metadata.
61-
62-
For many of the tensors we keep both GPU and CPU versions.
63-
"""
64-
65-
num_reqs: int
66-
"""Number of requests"""
67-
68-
num_actual_tokens: int
69-
"""Total number of tokens in batch"""
70-
71-
decode_token_per_req: int
72-
73-
actual_seq_lengths_q: list[int]
74-
75-
attn_mask: torch.Tensor = None
76-
77-
spec_attn_mask: torch.Tensor = None
78-
79-
graph_pad_size: int = -1
80-
81-
8256
def split_decodes_and_prefills(
8357
common_attn_metadata: AscendCommonAttentionMetadata,
8458
decode_threshold: int = 1,

0 commit comments

Comments
 (0)