-
Notifications
You must be signed in to change notification settings - Fork 74
Labels
Description
From #5390 (comment):
Test all models: https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/18856801306 (see summary)
Summary:
=========================================
Summary of only failed models:
Real failed models: 3 [['meta-llama/Llama-3.2-1B', 'eager_fail_to_run'], ['google/gemma-2-2b', 'eager_fail_to_run'], ['CamemBert', 'eager_fail_to_run']]
Real failed models: 3 [['google/gemma-2-2b', 'eager_fail_to_run'], ['meta-llama/Llama-3.2-1B', 'eager_fail_to_run'], ['CamemBert', 'eager_fail_to_run']]
Real failed models: 4 [['google/gemma-2-2b', 'eager_fail_to_run'], ['meta-llama/Llama-3.2-1B', 'eager_fail_to_run'], ['openai/whisper-tiny', 'fail_accuracy'], ['CamemBert', 'eager_fail_to_run']]
Real failed models: 3 [['meta-llama/Llama-3.2-1B', 'eager_fail_to_run'], ['CamemBert', 'eager_fail_to_run'], ['google/gemma-2-2b', 'eager_fail_to_run']]
Real failed models: 3 [['CamemBert', 'eager_fail_to_run'], ['meta-llama/Llama-3.2-1B', 'eager_fail_to_run'], ['google/gemma-2-2b', 'eager_fail_to_run']]
Real failed models: 1 [['CamemBert', 'eager_fail_to_run']]
Real failed models: 1 [['CamemBert', 'eager_fail_to_run']]
Real failed models: 1 [['CamemBert', 'eager_fail_to_run']]
Real failed models: 1 [['CamemBert', 'eager_fail_to_run']]
Real failed models: 1 [['CamemBert', 'eager_fail_to_run']]
Real failed models: 1 [['convit_base', 'eager_fail_to_run']]
Real failed models: 1 [['convit_base', 'eager_fail_to_run']]
Real failed models: 2 [['convit_base', 'eager_fail_to_run'], ['sebotnet33ts_256', 'fail_accuracy']]
Real failed models: 1 [['convit_base', 'eager_fail_to_run']]
Real failed models: 2 [['maml_omniglot', 'eager_fail_to_run'], ['functorch_maml_omniglot', 'eager_fail_to_run']]
Real failed models: 3 [['detectron2_fasterrcnn_r_50_fpn', 'eager_1st_run_OOM'], ['functorch_maml_omniglot', 'eager_fail_to_run'], ['maml_omniglot', 'eager_fail_to_run']]
Real failed models: 2 [['maml_omniglot', 'eager_fail_to_run'], ['functorch_maml_omniglot', 'eager_fail_to_run']]
Real failed models: 2 [['functorch_maml_omniglot', 'eager_fail_to_run'], ['maml_omniglot', 'eager_fail_to_run']]
Real failed models: 6 [['detectron2_fasterrcnn_r_50_dc5', 'eager_1st_run_OOM'], ['functorch_maml_omniglot', 'eager_fail_to_run'], ['detectron2_fasterrcnn_r_101_c4', 'eager_1st_run_OOM'], ['detectron2_fasterrcnn_r_50_c4', 'eager_1st_run_OOM'], ['maml_omniglot', 'eager_fail_to_run'], ['detectron2_fasterrcnn_r_101_dc5', 'eager_1st_run_OOM']]
Real failed models: 2 [['functorch_maml_omniglot', 'eager_fail_to_run'], ['maml_omniglot', 'eager_fail_to_run']]
Real failed models: 2 [['functorch_maml_omniglot', 'eager_fail_to_run'], ['maml_omniglot', 'eager_fail_to_run']]
Real failed models: 2 [['functorch_maml_omniglot', 'eager_fail_to_run'], ['maml_omniglot', 'eager_fail_to_run']]
Real failed models: 2 [['functorch_maml_omniglot', 'eager_fail_to_run'], ['maml_omniglot', 'eager_fail_to_run']]
Real failed models: 2 [['functorch_maml_omniglot', 'eager_fail_to_run'], ['maml_omniglot', 'eager_fail_to_run']]
ERROR: Found failed models!Error checking:
meta-llama/Llama-3.2-1Bandgoogle/gemma-2-2bwork locally. Problem with token in CI. Fixed in [E2E] Fixsecrets.HUGGING_FACE_HUB_TOKENusage #5399.CamemBertis not supposed to work according to intel/torch-xpu-ops@779f899. Can be ignored.openai/whisper-tiny (fail_accuracy). There might be a problem with the Triton, but it's a new model and hasn't been tested before (I checked it here: https://github.com/intel/torch-xpu-ops/actions/runs/17338511026/job/49263018812), so it's not a regression and a blocker.sebotnet33ts_256 (fail_accuracy). There might be a problem with the Triton. However, the problem was already present in the previous iteration of validation: https://github.com/intel/torch-xpu-ops/actions/runs/17338511026/job/49263026473#step:18:7802, so it's not a regression and a blocker.functorch_maml_omniglotandmaml_omniglotare fixed in [E2E] Aligntorchbenchdependencies to match whattorch-xpu-opsuses #5398. There was a problem in the environment.convit_base. The error looks like this and is not related to Triton:
return F.linear(input, self.weight, self.bias)
RuntimeError: expected mat1 and mat2 to have the same dtype, but got: float != c10::BFloat16detectron2_fasterrcnn_r_50_fpnthe problem was already present in the previous iteration of validation: https://github.com/intel/torch-xpu-ops/actions/runs/17329304896/job/49900834883#step:14:41819- Other Detectron models
(eager_1st_run_OOM)worked before, the cause of the breakdown is unknown.
Looks like we're ready.
PR in PyTorch: pytorch/pytorch#166436 (to double check)