Feature/mixlora entropy from v2.1.0 by mikecovlee · Pull Request #37 · TUDB-Labs/MoE-PEFT

mikecovlee · 2026-01-07T08:18:59Z

No description provided.

…ions

* update_0121 * Add tests for alignment of various models with flash attention and eager implementations - Introduced new test files for alignment testing of models including Gemma2, Phi, Phi3, Llama, Qwen2, and Mistral. - Implemented tests for flash attention forward pass in `test_alignment_flash_attn.py`. - Added eager path tests for Gemma2 and Phi models in `test_alignment_gemma2_eager.py` and `test_alignment_gemma_phi.py`. - Created alignment tests for Llama and Qwen2 models in `test_alignment_llama_qwen2.py`. - Included tests for Mistral and Phi3 models in `test_alignment_mistral_phi3.py`. - Each test verifies model configuration, initialization, and output shapes, ensuring proper integration with the PEFT framework. * Fix code quality issues: typos, documentation, unused imports, and code organization (#5) * Initial plan * Fix review comments: typos, comments, and code quality issues Co-authored-by: mikecovlee <16332179+mikecovlee@users.noreply.github.com> * Fix CI: Make qwen3 import conditional to support older transformers versions Co-authored-by: mikecovlee <16332179+mikecovlee@users.noreply.github.com> * Remove Qwen code from modeling_mistral.py and fix all casual→causal typos Co-authored-by: mikecovlee <16332179+mikecovlee@users.noreply.github.com> * Fix bare except clause in launch.py Co-authored-by: mikecovlee <16332179+mikecovlee@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: mikecovlee <16332179+mikecovlee@users.noreply.github.com> * [fix] update torch and transformers dependencies to specific versions * [refactor] update Qwen model imports and improve softcap parameter formatting --------- Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: mikecovlee <16332179+mikecovlee@users.noreply.github.com>

…1.0 without destructive changes\n\n- Config: add router_dyn_loss_coef, entropy_* params\n- Router loss: combine entropy and load-balance losses\n- Common: add tsallis/renyi/shannon entropy utilities and export

mikecovlee and others added 4 commits January 6, 2026 10:26

[refactor] remove trust_remote_code flags and update model configurat…

513174f

…ions

Add tests (#7)

479debd

mixlora: add entropy-regularized router loss (Tsallis/Rényi) from v2.…

8d8dbc6

…1.0 without destructive changes\n\n- Config: add router_dyn_loss_coef, entropy_* params\n- Router loss: combine entropy and load-balance losses\n- Common: add tsallis/renyi/shannon entropy utilities and export

mikecovlee closed this Jan 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/mixlora entropy from v2.1.0#37

Feature/mixlora entropy from v2.1.0#37
mikecovlee wants to merge 4 commits intoTUDB-Labs:mainfrom
scu-covariant:feature/mixlora-entropy-from-v2.1.0

mikecovlee commented Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mikecovlee commented Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant