Skip to content

【feature】add redundancy expert#910

Merged
hiworldwzj merged 8 commits intomainfrom
redundancy_expert
May 27, 2025
Merged

【feature】add redundancy expert#910
hiworldwzj merged 8 commits intomainfrom
redundancy_expert

Conversation

@hiworldwzj
Copy link
Collaborator

No description provided.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @hiworldwzj, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

Hello team, gemini-code-assist here to provide a summary of this pull request. This PR introduces support for 'redundancy experts' within the Mixture of Experts (MoE) implementation, specifically targeting models like DeepSeekV3. The changes involve adding configuration options to specify redundancy experts, modifying the expert selection and dispatch logic to include these extra experts, adjusting weight loading, and implementing a dedicated Triton kernel to handle the re-mapping of selected expert IDs.

Highlights

  • Redundancy Expert Support: Adds core functionality to handle redundancy experts in the MoE layers, allowing for specific experts to be loaded and potentially selected across different ranks.
  • Configuration via File: Introduces a new command-line argument --ep_redundancy_expert_config_path to load a JSON configuration file specifying the number of redundancy experts and their IDs per layer.
  • Triton Kernel for ID Repair: Implements a new Triton kernel (redundancy_topk_ids_repair) to adjust the indices of the selected top-k experts, ensuring that redundancy experts are correctly identified and mapped to their allocated positions within the expert list on each rank.
  • Updated MoE Logic: Modifies the FusedMoeWeightEp class to load weights for redundancy experts, adjust internal lists and counts to include them, and integrate the ID repair step into the expert selection and dispatch processes.
  • Communication Buffer Adjustment: Updates the DeepEP communication buffer initialization to account for the total number of experts (routed + redundant) when calculating buffer sizes and QP counts.

Changelog

Click here to see the changelog
  • lightllm/common/basemodel/layer_weights/meta_weights/fused_moe_weight_ep.py
    • Import utility functions and the new Triton kernel for redundancy experts.
    • Add instance variables to store redundancy expert configuration and total expert count.
    • Resize internal lists (experts_up_projs, experts_gate_projs, etc.) to accommodate redundancy experts.
    • Call redundancy_topk_ids_repair after expert selection in various methods if redundancy experts are configured.
    • Update the num_experts parameter passed to dispatch/implementation functions to include redundancy experts.
    • Modify weight and scale loading logic (load_hf_weights, _load_weight_scale) to load parameters for redundancy experts.
    • Adjust loop ranges in _fuse and _fuse_weight_scale to process redundancy experts.
  • lightllm/common/basemodel/triton_kernel/redundancy_topk_ids_repair.py
    • Add a new file containing a Triton kernel _redundancy_topk_ids_repair_kernel and a Python wrapper redundancy_topk_ids_repair.
    • The kernel re-maps selected top-k expert IDs, specifically adjusting indices for configured redundancy experts.
  • lightllm/distributed/communication_op.py
    • Import get_redundancy_expert_num.
    • Modify new_deepep_group to accept hidden_size.
    • Calculate the total number of experts (self.ll_num_experts) including redundancy experts.
    • Update the calculation for num_qps_per_rank using the new total expert count.
  • lightllm/models/deepseek2/model.py
    • Pass hidden_size when initializing the DeepEP group in _init_custom.
  • lightllm/server/api_cli.py
    • Add --ep_redundancy_expert_config_path command-line argument.
  • lightllm/server/core/objs/start_args_type.py
    • Add ep_redundancy_expert_config_path field to the StartArgs dataclass.
  • lightllm/utils/envs_utils.py
    • Add get_redundancy_expert_ids and get_redundancy_expert_num functions to read the redundancy expert configuration from a JSON file.
  • test/test_redundancy_expert_config.json
    • Add a sample JSON configuration file for redundancy experts.
  • unit_tests/common/basemodel/triton_kernel/test_redundancy_topk_ids_repair.py
    • Add a unit test for the redundancy_topk_ids_repair Triton kernel.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.


Experts route with speed,
Some redundant, meet the need,
IDs shift and mend.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant feature: support for redundancy experts. The changes span across model weight loading, initialization, the forward pass logic (especially expert selection and dispatch), and communication operations. A new Triton kernel has been added to repair topk_ids for these redundant experts, and configuration for this feature is managed via a new JSON file and command-line argument.

The overall implementation appears logical and consistent. The updates to account for the total number of experts (including redundant ones) in various parts of the code seem correct. The unit tests for the new Triton kernel are a good addition for verifying the complex ID remapping logic.

There are a few areas for improvement, primarily concerning comments and code clarity, detailed below. Additionally, providing a description in the pull request body would be beneficial for reviewers to understand the motivation and high-level design of this feature.

Given the medium-severity comments, I recommend addressing them before merging.

Summary of Findings

  • Code Comments in Chinese: Several comments in lightllm/common/basemodel/layer_weights/meta_weights/fused_moe_weight_ep.py and lightllm/utils/envs_utils.py are in Chinese. Translating them to English would improve code maintainability and accessibility for a broader range of contributors.
  • Clarity of Triton Kernel Logic: The new Triton kernel _redundancy_topk_ids_repair_kernel in lightllm/common/basemodel/triton_kernel/redundancy_topk_ids_repair.py contains complex ID remapping logic. Adding a brief explanatory comment for the transformation on line 20 would enhance readability and future maintenance.
  • Pull Request Description: The pull request description is empty. For a feature of this nature, providing a summary of the changes, the motivation, and any relevant design decisions would be very helpful for reviewers.
  • Trailing Comma in JSON Test File (Not Commented Due to Settings): The file test/test_redundancy_expert_config.json (line 5) contains a trailing comma in a list ("default": [77,]). While Python's json.load handles this, it's not strictly standard JSON and might cause issues with other parsers. It's generally better to avoid trailing commas in JSON files for broader compatibility.

Merge Readiness

This pull request introduces a valuable feature for redundant experts. The implementation is largely solid. However, there are a few medium-severity issues related to comments and code clarity that should be addressed to improve maintainability. Specifically, translating Chinese comments to English and adding a clarifying comment to the new Triton kernel would be beneficial.

I am unable to approve pull requests. I recommend addressing the highlighted issues before this PR is merged by other reviewers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Could this comment be translated to English? It would improve code maintainability and accessibility for a broader range of contributors.

For example: # Load weight parameters for redundant experts

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the previous comment, could this comment also be translated to English for better maintainability?

For example: # Load scale parameters for redundant experts

offs_d = block_index * BLOCK_SIZE + tl.arange(0, BLOCK_SIZE)
mask = offs_d < topk_total_num
current_topk_ids = tl.load(topk_ids_ptr + offs_d, mask=mask, other=0)
new_current_topk_ids = (current_topk_ids // ep_expert_num) * redundancy_expert_num + current_topk_ids
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The ID remapping logic here is quite dense: (current_topk_ids // ep_expert_num) * redundancy_expert_num + current_topk_ids.

To aid future understanding and maintenance of this new kernel, could a brief comment be added to explain the purpose of this transformation? For instance, something like:

# Remap original expert IDs to a new space that accounts for redundant expert slots.

This would help clarify how the expert ID space is being adjusted before checking against redundant expert IDs.

Comment on lines 80 to 87
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

These comments explaining the functionality and configuration format are very helpful! Could they be translated to English to make them more accessible to a wider audience and improve overall code maintainability?

For example:

# get_redundancy_expert_ids and get_redundancy_expert_num are primarily used to obtain the IDs and number of redundant experts during inference.
# They depend on a configuration file specified by ep_redundancy_expert_config_path, which is a JSON formatted text file.
# The content format is as follows:
# {
#   "redundancy_expert_num": 1,  # Number of redundant experts per rank
#   "0": [0],                    # Key: layer_index (string), Value: list of original expert IDs that are redundant for this layer
#   "1": [0],
#   "default": [0]               # Default list of redundant expert IDs if layer-specific entry is not found
# }

@hiworldwzj hiworldwzj merged commit ec3cb03 into main May 27, 2025
1 check failed
@hiworldwzj hiworldwzj deleted the redundancy_expert branch May 27, 2025 09:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments