Skip to content

Conversation

@mayuyuace
Copy link
Collaborator

Add op moe_align_block_size & batched_moe_align_block_size.
From: https://github.com/vllm-project/vllm/blob/main/csrc/moe/moe_align_sum_kernels.cu

Copilot AI review requested due to automatic review settings October 27, 2025 08:31
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds two new MOE (Mixture of Experts) operations: moe_align_block_size and batched_moe_align_block_size, which align token distribution across experts to be compatible with block sizes for matrix multiplication. The implementation is adapted from vLLM's MOE alignment kernels.

Key changes:

  • Implements SYCL/XPU kernels for MOE token alignment with block size constraints
  • Adds comprehensive test coverage for both regular and batched alignment scenarios
  • Includes utility function for rounding up to block size multiples

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/utils.py Adds round_up utility function for block size calculations
tests/test_moe_align_block_size.py Comprehensive test suite covering various scenarios including determinism, expert mapping, and edge cases
tests/register_ops.py Registers the two new MOE alignment operations with PyTorch
tests/ops/moe_align_block_size_ops.py Python wrappers for the MOE alignment operations with detailed documentation
csrc/moe/torch_bindings.cpp Binds the C++ implementations to PyTorch operators
csrc/moe/moe_ops.h Declares the function signatures for the MOE alignment operations
csrc/moe/moe_align_sum_kernels.cpp Implements the SYCL kernels for MOE token alignment operations

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

):
"""
Verify that actual_sorted_ids follows the correct expert-level sorting.
The kerne limplementation may or may not preserve original token order
Copy link

Copilot AI Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'kerne limplementation' to 'kernel implementation'.

Suggested change
The kerne limplementation may or may not preserve original token order
The kernel implementation may or may not preserve original token order

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

@jikunshang jikunshang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

int32_t* temp_storage = static_cast<int32_t*>(
slm.template get_multi_ptr<sycl::access::decorated::no>().get());

int32_t* shared_counts = temp_storage + 1024;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why 1024 here? please avoid use magic number.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Temp_storage needs 1024 int32 space.
CUDA does not need to display the declaration, but SYCL needs to display the declaration of this part of SLM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants