Skip to content

Conversation

@kaixuanliu
Copy link
Contributor

@kaixuanliu kaixuanliu commented Jan 19, 2026

This PR fixes following failed test cases:

FAILED tests/models/glm4_moe_lite/test_modeling_glm4_moe_lite.py::Glm4MoeModelTest::test_eager_matches_fa2_generate
 - RuntimeError: mat1 and mat2 shapes cannot be multiplied (14x256 and 512x32)
FAILED tests/models/glm4_moe_lite/test_modeling_glm4_moe_lite.py::Glm4MoeModelTest::test_flash_attention_2_continue
_generate_with_position_ids - RuntimeError: mat1 and mat2 shapes cannot be multiplied (91x256 and 512x32)
FAILED tests/models/glm4_moe_lite/test_modeling_glm4_moe_lite.py::Glm4MoeModelTest::test_flash_attention_2_padding_
matches_padding_free_with_position_ids - RuntimeError: mat1 and mat2 shapes cannot be multiplied (91x256 and 512x32
)
FAILED tests/models/glm4_moe_lite/test_modeling_glm4_moe_lite.py::Glm4MoeModelTest::test_flash_attention_2_padding_matches_padding_free_with_position_ids_and_fa_kwargs - RuntimeError: mat1 and mat2 shapes cannot be multiplied (91$256 and 512x32)
FAILED tests/models/glm4_moe_lite/test_modeling_glm4_moe_lite.py::Glm4MoeModelTest::test_flash_attn_2_equivalence $ RuntimeError: mat1 and mat2 shapes cannot be multiplied (91x256 and 512x32)
FAILED tests/models/glm4_moe_lite/test_modeling_glm4_moe_lite.py::Glm4MoeModelTest::test_flash_attn_2_fp32_ln - Ru$timeError: mat1 and mat2 shapes cannot be multiplied (91x256 and 512x32)
FAILED tests/models/glm4_moe_lite/test_modeling_glm4_moe_lite.py::Glm4MoeModelTest::test_flash_attn_2_from_config $ RuntimeError: mat1 and mat2 shapes cannot be multiplied (91x256 and 512x32)
FAILED tests/models/glm4_moe_lite/test_modeling_glm4_moe_lite.py::Glm4MoeModelTest::test_flash_attn_2_inference_eq$ivalence - RuntimeError: mat1 and mat2 shapes cannot be multiplied (7x256 and 512x32)
FAILED tests/models/glm4_moe_lite/test_modeling_glm4_moe_lite.py::Glm4MoeModelTest::test_flash_attn_2_inference_eq$ivalence_right_padding - RuntimeError: mat1 and mat2 shapes cannot be multiplied (7x256 and 512x32)
FAILED tests/models/glm4_moe_lite/test_modeling_glm4_moe_lite.py::Glm4MoeModelTest::test_flash_attn_kernels_infere$ce_equivalence - RuntimeError: mat1 and mat2 shapes cannot be multiplied (7x256 and 512x32)

This is because the test configuration used the default value of 256 for v_head_dim, which caused a dimension mismatch when padding and slicing in flash attention. After this PR all above cases can get passed.

@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: glm4_moe_lite

Signed-off-by: Liu, Kaixuan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant