v0.2.0
What's Changed
- feat: Unify config/backend CLI & add config export support by @Xiaoming-AMD in #151
- feat: reduce cpu sync of moe_router_force_load_balancing by @RuibinCheung in #153
- feat(light-megatron): add LightMegatronPretrainTrainer with clean config-based integration by @Xiaoming-AMD in #136
- fix(docker): Use
docker_podman_proxyfor container cleanup by @Xiaoming-AMD in #157 - feat(moe): fused moe router add scatter logics, modify flags to primus_turbo.yaml by @ChengYao-amd in #141
- feat(turbo): update turbo grouped gemm bf16/fp16 by @xiaobochen-amd in #149
- fix(pp): fix the validation issue when vpp is not set in manual split mode by @lhzhang333 in #161
- Add initial llama4 configs by @chriscai-amd in #163
- (ut)add megatron ut scripts by @llying-001 in #164
- refactor(attn): update attention utils interface by @ChengYao-amd in #159
- Update Llama-4-Scout-17B-16E Megatron Configs by @chriscai-amd in #165
- update log/wandb/tensorboard by @wenxie-amd in #169
- [Llama4] Add Llama4 17B128E Maverick config by @chriscai-amd in #172
- feat(turbo): attn interface fit turbo by @ChengYao-amd in #173
- turn on manual gc by @wenxie-amd in #175
- add userid to header by @weilei0120 in #177
- (feat)async tp: adapt async-tp for te2.x api by @llying-001 in #178
- [Perf Issue] Disable manual_gc by default and update rocm_mem behavior by @wenxie-amd in #179
- update proxy model config by @wenxie-amd in #167
- upgrade docker image by @wenxie-amd in #176
- Enable turbo v25.8 by @vidushi8 in #180
- fix wandb/tensorboard mem item by @wenxie-amd in #181
- (test) add torchtitan ut and integration test by @llying-001 in #170
- add te fused cross entropy argument by @wenxie-amd in #182
- make pp_data_dir configurable and add pp_vis dependencies by @lhzhang333 in #183
- pp_warmup optimization by @lhzhang333 in #185
- move clean step into UT by @wenxie-amd in #186
New Contributors
- @chriscai-amd made their first contribution in #163
- @weilei0120 made their first contribution in #177
Full Changelog: v0.1.0-rc1...v0.2.0