fix failed unit test for glm_moe_lite model #43346

kaixuanliu · 2026-01-19T09:00:56Z

This PR fixes following failed test cases:

FAILED tests/models/glm4_moe_lite/test_modeling_glm4_moe_lite.py::Glm4MoeModelTest::test_eager_matches_fa2_generate
 - RuntimeError: mat1 and mat2 shapes cannot be multiplied (14x256 and 512x32)
FAILED tests/models/glm4_moe_lite/test_modeling_glm4_moe_lite.py::Glm4MoeModelTest::test_flash_attention_2_continue
_generate_with_position_ids - RuntimeError: mat1 and mat2 shapes cannot be multiplied (91x256 and 512x32)
FAILED tests/models/glm4_moe_lite/test_modeling_glm4_moe_lite.py::Glm4MoeModelTest::test_flash_attention_2_padding_
matches_padding_free_with_position_ids - RuntimeError: mat1 and mat2 shapes cannot be multiplied (91x256 and 512x32
)
FAILED tests/models/glm4_moe_lite/test_modeling_glm4_moe_lite.py::Glm4MoeModelTest::test_flash_attention_2_padding_matches_padding_free_with_position_ids_and_fa_kwargs - RuntimeError: mat1 and mat2 shapes cannot be multiplied (91$256 and 512x32)
FAILED tests/models/glm4_moe_lite/test_modeling_glm4_moe_lite.py::Glm4MoeModelTest::test_flash_attn_2_equivalence $ RuntimeError: mat1 and mat2 shapes cannot be multiplied (91x256 and 512x32)
FAILED tests/models/glm4_moe_lite/test_modeling_glm4_moe_lite.py::Glm4MoeModelTest::test_flash_attn_2_fp32_ln - Ru$timeError: mat1 and mat2 shapes cannot be multiplied (91x256 and 512x32)
FAILED tests/models/glm4_moe_lite/test_modeling_glm4_moe_lite.py::Glm4MoeModelTest::test_flash_attn_2_from_config $ RuntimeError: mat1 and mat2 shapes cannot be multiplied (91x256 and 512x32)
FAILED tests/models/glm4_moe_lite/test_modeling_glm4_moe_lite.py::Glm4MoeModelTest::test_flash_attn_2_inference_eq$ivalence - RuntimeError: mat1 and mat2 shapes cannot be multiplied (7x256 and 512x32)
FAILED tests/models/glm4_moe_lite/test_modeling_glm4_moe_lite.py::Glm4MoeModelTest::test_flash_attn_2_inference_eq$ivalence_right_padding - RuntimeError: mat1 and mat2 shapes cannot be multiplied (7x256 and 512x32)
FAILED tests/models/glm4_moe_lite/test_modeling_glm4_moe_lite.py::Glm4MoeModelTest::test_flash_attn_kernels_infere$ce_equivalence - RuntimeError: mat1 and mat2 shapes cannot be multiplied (7x256 and 512x32)

This is because the test configuration used the default value of 256 for v_head_dim, which caused a dimension mismatch when padding and slicing in flash attention. After this PR all above cases can get passed.

Signed-off-by: Liu, Kaixuan <[email protected]>

github-actions · 2026-01-19T09:01:54Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: glm4_moe_lite

Signed-off-by: Liu, Kaixuan <[email protected]>

fix failed unit test for glm_moe_lite model

7914b53

Signed-off-by: Liu, Kaixuan <[email protected]>

use 128,not 64

5904fa9

Signed-off-by: Liu, Kaixuan <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix failed unit test for glm_moe_lite model #43346

fix failed unit test for glm_moe_lite model #43346

kaixuanliu commented Jan 19, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Jan 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fix failed unit test for glm_moe_lite model #43346

Are you sure you want to change the base?

fix failed unit test for glm_moe_lite model #43346

Conversation

kaixuanliu commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jan 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kaixuanliu commented Jan 19, 2026 •

edited

Loading