Commit 1a7cbdb
authored
A5 support reshape and cache in CP situation (#7636)
### What this PR does / why we need it?
In the A5 scenario, the CP is supported. The A5 reshape and cache
operators need to go through the aclnn operator Therefore, the routing
of DeviceAdaptor is added.
In addition, the input of the A5 aclnn operator should be continuous.
There are some non-contiguous operations, such as slicing with
intervals. `slot_mapping = attn_metadata.slot_mapping[:
num_decode_tokens * self.pcp_size : self.pcp_size]`, where
`slot_mapping` is non-contiguous and needs to be contiguous.Therefore,
the continuity of key, value, and slot-mapping is fixed.
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
- vLLM version: v0.18.0
- vLLM main:
vllm-project/vllm@ed359c4
---------
Signed-off-by: lenghuixing0330 <2531948770@qq.com>1 parent dbf1348 commit 1a7cbdb
File tree
2 files changed
+10
-5
lines changed- vllm_ascend
- attention/context_parallel
- device
2 files changed
+10
-5
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
49 | 49 | | |
50 | 50 | | |
51 | 51 | | |
| 52 | + | |
52 | 53 | | |
53 | 54 | | |
54 | 55 | | |
| |||
752 | 753 | | |
753 | 754 | | |
754 | 755 | | |
755 | | - | |
| 756 | + | |
756 | 757 | | |
757 | 758 | | |
758 | 759 | | |
759 | 760 | | |
760 | | - | |
| 761 | + | |
761 | 762 | | |
762 | 763 | | |
763 | 764 | | |
| |||
784 | 785 | | |
785 | 786 | | |
786 | 787 | | |
787 | | - | |
| 788 | + | |
788 | 789 | | |
789 | 790 | | |
790 | 791 | | |
791 | 792 | | |
792 | | - | |
| 793 | + | |
793 | 794 | | |
794 | 795 | | |
795 | 796 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
204 | 204 | | |
205 | 205 | | |
206 | 206 | | |
207 | | - | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
208 | 212 | | |
209 | 213 | | |
210 | 214 | | |
| |||
0 commit comments