Refactor: Merge build_moe_ffn_from_probs function into build_moe_ffn

### Background

As part of the work in PR #14898, the function `build_moe_ffn_from_probs` was introduced to handle SmallThinker's unique architecture where the MoE router is positioned before the attention block. This has resulted in unfortunate code duplication with the existing `build_moe_ffn` function.

### The Task

As suggested in [this code review comment](https://github.com/ggml-org/llama.cpp/pull/14898#discussion_r2234139265), the proposed solution is to merge the logic of `build_moe_ffn_from_probs` into the main `build_moe_ffn` function. This can be achieved by:

1.  Modifying `build_moe_ffn` to accept an optional `ggml_tensor *probs` parameter, which defaults to `nullptr`.
2.  Using this parameter as a "toggle":
    - If `probs` is provided, the function should use it directly and skip the internal logits/probs calculation.
    - If `probs` is `nullptr`, the function should behave as it currently does.
3.  Carefully handle the divergent logic paths inside the unified function, especially regarding weight normalization and activation functions.
4.  Once the merge is complete and verified, the now-redundant `build_moe_ffn_from_probs` function should be removed.

I plan to submit a pull request addressing this within the next 1-3 days.

### Context

- **Original PR:** #14898
- **Relevant Discussion:** https://github.com/ggml-org/llama.cpp/pull/14898#discussion_r2234139265

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor: Merge build_moe_ffn_from_probs function into build_moe_ffn #14920

Background

The Task

Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Refactor: Merge build_moe_ffn_from_probs function into build_moe_ffn #14920

Description

Background

The Task

Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions