Skip to content

Refactor: Merge build_moe_ffn_from_probs function into build_moe_ffn #14920

@wdl339

Description

@wdl339

Background

As part of the work in PR #14898, the function build_moe_ffn_from_probs was introduced to handle SmallThinker's unique architecture where the MoE router is positioned before the attention block. This has resulted in unfortunate code duplication with the existing build_moe_ffn function.

The Task

As suggested in this code review comment, the proposed solution is to merge the logic of build_moe_ffn_from_probs into the main build_moe_ffn function. This can be achieved by:

  1. Modifying build_moe_ffn to accept an optional ggml_tensor *probs parameter, which defaults to nullptr.
  2. Using this parameter as a "toggle":
    • If probs is provided, the function should use it directly and skip the internal logits/probs calculation.
    • If probs is nullptr, the function should behave as it currently does.
  3. Carefully handle the divergent logic paths inside the unified function, especially regarding weight normalization and activation functions.
  4. Once the merge is complete and verified, the now-redundant build_moe_ffn_from_probs function should be removed.

I plan to submit a pull request addressing this within the next 1-3 days.

Context

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions