-
Notifications
You must be signed in to change notification settings - Fork 386
[4/N][Refactor] Refactor AscendAttentionMetadataBuilder
for better extensibility and make the builder class of torchair extend from it
#2375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the build
method in AscendAttentionMetadataBuilder
by extracting _prepare_build_info
and _assemble_build_info
. This is a good change that improves modularity and extensibility, as demonstrated by the new AscendAttentionTorchairMetadataBuilder
. The implementation is solid, but I've found one area in the new torchair_attention.py
file with some confusing code that could be clarified.
pad_value = 0 | ||
num_token_pad_size = graph_pad_size - num_actual_tokens | ||
num_reqs_pad_size = ( | ||
graph_pad_size // self.runner.decode_token_per_req - | ||
num_reqs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable pad_value
is assigned the value 0
on line 247, but this value is never used because it is unconditionally reassigned to 1
on line 252 before its first use. This makes the assignment on line 247 dead code, which is confusing and should be removed to improve clarity.
num_token_pad_size = graph_pad_size - num_actual_tokens
num_reqs_pad_size = (
graph_pad_size // self.runner.decode_token_per_req -
num_reqs)
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
_prepare_build_info()
and _assemble_build_info()
from build()
in AscendAttentionMetadataBuilder
AscendAttentionMetadataBuilder
for better extensibility and make the builder class of torchair extend from it
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2375 +/- ##
==========================================
+ Coverage 77.99% 78.01% +0.02%
==========================================
Files 134 134
Lines 18498 18515 +17
==========================================
+ Hits 14427 14444 +17
Misses 4071 4071
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
cafae88
to
5ae0c6f
Compare
…make the builder class of torchair extend from it Signed-off-by: shen-shanshan <[email protected]>
What this PR does / why we need it?
Refactor
AscendAttentionMetadataBuilder
for better extensibility and make the builder class of torchair extend from it.Extract
_assemble_build_info()
and_assemble_attn_metadata()
method frombuild()
inAscendAttentionMetadataBuilder
for better extensibility.Workflow of
build()
method:_assemble_build_info()
: the custom logic that can be overwritten intorchair_attention.py
._assemble_attn_metadata()
: the custom logic that can be overwritten intorchair_attention.py
.After this refactor, we can remove the
build()
method inAscendAttentionTorchairMetadataBuilder
, and just need to overwrite these two methods:_assemble_build_info()
and_assemble_attn_metadata()
.Note
Do not merge this PR before #2017.
Does this PR introduce any user-facing change?
How was this patch tested?