Skip to content

Commit 794110e

Browse files
lingvo-botcopybara-github
authored andcommitted
gshard builder and layers for MoE lifelong pretraining
Now supports: * "e_dim_old" argument in UniTransformer to expand new experts and gating dimensions from old ones. * "epad_idx_old" argument in UniTransformer to mark extra experts and gating dimensions to be inactive (for the case where the requested number of experts is not divisible by 2^n, e.g. 28 experts will leave 4 remaining experts muted) * Merge/Split experts and gating dimensions for loading checkpoints into a new MoE with expanded experts and gatings. * KL_div loss for MoE Lifelong Learning PiperOrigin-RevId: 491977310
1 parent 9445632 commit 794110e

File tree

2 files changed

+750
-14
lines changed

2 files changed

+750
-14
lines changed

0 commit comments

Comments
 (0)