-
Notifications
You must be signed in to change notification settings - Fork 52
Open
Description
我在 paddlepaddle 库中修改了 cmake 代码,使用了更现代的 FindPython(见 PaddlePaddle/Paddle#77302 )。但是在以下单测报错:
- tests/single_card_tests/model/test_gpt_model_moe.py
- tests/single_card_tests/model/test_gpt_model_moe_grouped_gemm.py
- tests/multi_card_tests/pipeline_parallel/test_gpt_pp.py
报错均是因为 loss 数值有微小波动,比如 AssertionError: loss not equal (5.212523937225342 != 5.212523460388184), please check your modify
目前尚不清楚这种微小波动是如何造成的。但我想知道以下的 loss 断言是否过于严格?像这种误差是否可以容忍?
tests/single_card_tests/model/test_gpt_model_moe.py#L175-L192
repo_name = os.environ.get("repo_flag")
if repo_name == "paddlefleet":
if judge_machine_type() == "H":
assert loss.item() == 5.212523460388184, (
f"loss not equal ({loss.item()} != 5.212523460388184), please check your modify"
)
assert embed_tokens_grad_norm == 6.811275959014893, (
f"grad norm of embed_tokens not equal ({embed_tokens_grad_norm} != 6.811275959014893), please check your modify"
)
elif judge_machine_type() == "V":
assert loss.item() == 5.284281253814697, (
f"loss not equal ({loss.item()} != 5.284281253814697), please check your modify"
)
assert embed_tokens_grad_norm == 9.912039756774902, (
f"grad norm of embed_tokens not equal ({embed_tokens_grad_norm} != 9.912039756774902, please check your modify"
)
else:
passReactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels