Conversation
There was a problem hiding this comment.
Hello @hiworldwzj, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
Summary of Changes
Hello team, gemini-code-assist here to provide a summary of this pull request. This PR introduces support for 'redundancy experts' within the Mixture of Experts (MoE) implementation, specifically targeting models like DeepSeekV3. The changes involve adding configuration options to specify redundancy experts, modifying the expert selection and dispatch logic to include these extra experts, adjusting weight loading, and implementing a dedicated Triton kernel to handle the re-mapping of selected expert IDs.
Highlights
- Redundancy Expert Support: Adds core functionality to handle redundancy experts in the MoE layers, allowing for specific experts to be loaded and potentially selected across different ranks.
- Configuration via File: Introduces a new command-line argument
--ep_redundancy_expert_config_pathto load a JSON configuration file specifying the number of redundancy experts and their IDs per layer. - Triton Kernel for ID Repair: Implements a new Triton kernel (
redundancy_topk_ids_repair) to adjust the indices of the selected top-k experts, ensuring that redundancy experts are correctly identified and mapped to their allocated positions within the expert list on each rank. - Updated MoE Logic: Modifies the
FusedMoeWeightEpclass to load weights for redundancy experts, adjust internal lists and counts to include them, and integrate the ID repair step into the expert selection and dispatch processes. - Communication Buffer Adjustment: Updates the DeepEP communication buffer initialization to account for the total number of experts (routed + redundant) when calculating buffer sizes and QP counts.
Changelog
Click here to see the changelog
- lightllm/common/basemodel/layer_weights/meta_weights/fused_moe_weight_ep.py
- Import utility functions and the new Triton kernel for redundancy experts.
- Add instance variables to store redundancy expert configuration and total expert count.
- Resize internal lists (
experts_up_projs,experts_gate_projs, etc.) to accommodate redundancy experts. - Call
redundancy_topk_ids_repairafter expert selection in various methods if redundancy experts are configured. - Update the
num_expertsparameter passed to dispatch/implementation functions to include redundancy experts. - Modify weight and scale loading logic (
load_hf_weights,_load_weight_scale) to load parameters for redundancy experts. - Adjust loop ranges in
_fuseand_fuse_weight_scaleto process redundancy experts.
- lightllm/common/basemodel/triton_kernel/redundancy_topk_ids_repair.py
- Add a new file containing a Triton kernel
_redundancy_topk_ids_repair_kerneland a Python wrapperredundancy_topk_ids_repair. - The kernel re-maps selected top-k expert IDs, specifically adjusting indices for configured redundancy experts.
- Add a new file containing a Triton kernel
- lightllm/distributed/communication_op.py
- Import
get_redundancy_expert_num. - Modify
new_deepep_groupto accepthidden_size. - Calculate the total number of experts (
self.ll_num_experts) including redundancy experts. - Update the calculation for
num_qps_per_rankusing the new total expert count.
- Import
- lightllm/models/deepseek2/model.py
- Pass
hidden_sizewhen initializing the DeepEP group in_init_custom.
- Pass
- lightllm/server/api_cli.py
- Add
--ep_redundancy_expert_config_pathcommand-line argument.
- Add
- lightllm/server/core/objs/start_args_type.py
- Add
ep_redundancy_expert_config_pathfield to theStartArgsdataclass.
- Add
- lightllm/utils/envs_utils.py
- Add
get_redundancy_expert_idsandget_redundancy_expert_numfunctions to read the redundancy expert configuration from a JSON file.
- Add
- test/test_redundancy_expert_config.json
- Add a sample JSON configuration file for redundancy experts.
- unit_tests/common/basemodel/triton_kernel/test_redundancy_topk_ids_repair.py
- Add a unit test for the
redundancy_topk_ids_repairTriton kernel.
- Add a unit test for the
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Experts route with speed,
Some redundant, meet the need,
IDs shift and mend.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Code Review
This pull request introduces a significant feature: support for redundancy experts. The changes span across model weight loading, initialization, the forward pass logic (especially expert selection and dispatch), and communication operations. A new Triton kernel has been added to repair topk_ids for these redundant experts, and configuration for this feature is managed via a new JSON file and command-line argument.
The overall implementation appears logical and consistent. The updates to account for the total number of experts (including redundant ones) in various parts of the code seem correct. The unit tests for the new Triton kernel are a good addition for verifying the complex ID remapping logic.
There are a few areas for improvement, primarily concerning comments and code clarity, detailed below. Additionally, providing a description in the pull request body would be beneficial for reviewers to understand the motivation and high-level design of this feature.
Given the medium-severity comments, I recommend addressing them before merging.
Summary of Findings
- Code Comments in Chinese: Several comments in
lightllm/common/basemodel/layer_weights/meta_weights/fused_moe_weight_ep.pyandlightllm/utils/envs_utils.pyare in Chinese. Translating them to English would improve code maintainability and accessibility for a broader range of contributors. - Clarity of Triton Kernel Logic: The new Triton kernel
_redundancy_topk_ids_repair_kernelinlightllm/common/basemodel/triton_kernel/redundancy_topk_ids_repair.pycontains complex ID remapping logic. Adding a brief explanatory comment for the transformation on line 20 would enhance readability and future maintenance. - Pull Request Description: The pull request description is empty. For a feature of this nature, providing a summary of the changes, the motivation, and any relevant design decisions would be very helpful for reviewers.
- Trailing Comma in JSON Test File (Not Commented Due to Settings): The file
test/test_redundancy_expert_config.json(line 5) contains a trailing comma in a list ("default": [77,]). While Python'sjson.loadhandles this, it's not strictly standard JSON and might cause issues with other parsers. It's generally better to avoid trailing commas in JSON files for broader compatibility.
Merge Readiness
This pull request introduces a valuable feature for redundant experts. The implementation is largely solid. However, there are a few medium-severity issues related to comments and code clarity that should be addressed to improve maintainability. Specifically, translating Chinese comments to English and adding a clarifying comment to the new Triton kernel would be beneficial.
I am unable to approve pull requests. I recommend addressing the highlighted issues before this PR is merged by other reviewers.
| offs_d = block_index * BLOCK_SIZE + tl.arange(0, BLOCK_SIZE) | ||
| mask = offs_d < topk_total_num | ||
| current_topk_ids = tl.load(topk_ids_ptr + offs_d, mask=mask, other=0) | ||
| new_current_topk_ids = (current_topk_ids // ep_expert_num) * redundancy_expert_num + current_topk_ids |
There was a problem hiding this comment.
The ID remapping logic here is quite dense: (current_topk_ids // ep_expert_num) * redundancy_expert_num + current_topk_ids.
To aid future understanding and maintenance of this new kernel, could a brief comment be added to explain the purpose of this transformation? For instance, something like:
# Remap original expert IDs to a new space that accounts for redundant expert slots.
This would help clarify how the expert ID space is being adjusted before checking against redundant expert IDs.
lightllm/utils/envs_utils.py
Outdated
There was a problem hiding this comment.
These comments explaining the functionality and configuration format are very helpful! Could they be translated to English to make them more accessible to a wider audience and improve overall code maintainability?
For example:
# get_redundancy_expert_ids and get_redundancy_expert_num are primarily used to obtain the IDs and number of redundant experts during inference.
# They depend on a configuration file specified by ep_redundancy_expert_config_path, which is a JSON formatted text file.
# The content format is as follows:
# {
# "redundancy_expert_num": 1, # Number of redundant experts per rank
# "0": [0], # Key: layer_index (string), Value: list of original expert IDs that are redundant for this layer
# "1": [0],
# "default": [0] # Default list of redundant expert IDs if layer-specific entry is not found
# }
No description provided.