Skip to content

test_gpt_model_moe.py 单测断言是否过于严格? #448

@cangtianhuang

Description

@cangtianhuang

我在 paddlepaddle 库中修改了 cmake 代码,使用了更现代的 FindPython(见 PaddlePaddle/Paddle#77302 )。但是在以下单测报错:

  • tests/single_card_tests/model/test_gpt_model_moe.py
  • tests/single_card_tests/model/test_gpt_model_moe_grouped_gemm.py
  • tests/multi_card_tests/pipeline_parallel/test_gpt_pp.py

报错均是因为 loss 数值有微小波动,比如 AssertionError: loss not equal (5.212523937225342 != 5.212523460388184), please check your modify

目前尚不清楚这种微小波动是如何造成的。但我想知道以下的 loss 断言是否过于严格?像这种误差是否可以容忍?

tests/single_card_tests/model/test_gpt_model_moe.py#L175-L192

        repo_name = os.environ.get("repo_flag")
        if repo_name == "paddlefleet":
            if judge_machine_type() == "H":
                assert loss.item() == 5.212523460388184, (
                    f"loss not equal ({loss.item()} != 5.212523460388184), please check your modify"
                )
                assert embed_tokens_grad_norm == 6.811275959014893, (
                    f"grad norm of embed_tokens not equal ({embed_tokens_grad_norm} != 6.811275959014893), please check your modify"
                )
            elif judge_machine_type() == "V":
                assert loss.item() == 5.284281253814697, (
                    f"loss not equal ({loss.item()} != 5.284281253814697), please check your modify"
                )
                assert embed_tokens_grad_norm == 9.912039756774902, (
                    f"grad norm of embed_tokens not equal ({embed_tokens_grad_norm} != 9.912039756774902, please check your modify"
                )
        else:
            pass

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions