Triton pin update in PyTorch

From https://github.com/intel/intel-xpu-backend-for-triton/pull/5390#issue-3556023429:

Test all models: https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/18856801306 (see summary)

Summary:
```bash
=========================================
Summary of only failed models:
Real failed models: 3 [['meta-llama/Llama-3.2-1B', 'eager_fail_to_run'], ['google/gemma-2-2b', 'eager_fail_to_run'], ['CamemBert', 'eager_fail_to_run']]
Real failed models: 3 [['google/gemma-2-2b', 'eager_fail_to_run'], ['meta-llama/Llama-3.2-1B', 'eager_fail_to_run'], ['CamemBert', 'eager_fail_to_run']]
Real failed models: 4 [['google/gemma-2-2b', 'eager_fail_to_run'], ['meta-llama/Llama-3.2-1B', 'eager_fail_to_run'], ['openai/whisper-tiny', 'fail_accuracy'], ['CamemBert', 'eager_fail_to_run']]
Real failed models: 3 [['meta-llama/Llama-3.2-1B', 'eager_fail_to_run'], ['CamemBert', 'eager_fail_to_run'], ['google/gemma-2-2b', 'eager_fail_to_run']]
Real failed models: 3 [['CamemBert', 'eager_fail_to_run'], ['meta-llama/Llama-3.2-1B', 'eager_fail_to_run'], ['google/gemma-2-2b', 'eager_fail_to_run']]
Real failed models: 1 [['CamemBert', 'eager_fail_to_run']]
Real failed models: 1 [['CamemBert', 'eager_fail_to_run']]
Real failed models: 1 [['CamemBert', 'eager_fail_to_run']]
Real failed models: 1 [['CamemBert', 'eager_fail_to_run']]
Real failed models: 1 [['CamemBert', 'eager_fail_to_run']]
Real failed models: 1 [['convit_base', 'eager_fail_to_run']]
Real failed models: 1 [['convit_base', 'eager_fail_to_run']]
Real failed models: 2 [['convit_base', 'eager_fail_to_run'], ['sebotnet33ts_256', 'fail_accuracy']]
Real failed models: 1 [['convit_base', 'eager_fail_to_run']]
Real failed models: 2 [['maml_omniglot', 'eager_fail_to_run'], ['functorch_maml_omniglot', 'eager_fail_to_run']]
Real failed models: 3 [['detectron2_fasterrcnn_r_50_fpn', 'eager_1st_run_OOM'], ['functorch_maml_omniglot', 'eager_fail_to_run'], ['maml_omniglot', 'eager_fail_to_run']]
Real failed models: 2 [['maml_omniglot', 'eager_fail_to_run'], ['functorch_maml_omniglot', 'eager_fail_to_run']]
Real failed models: 2 [['functorch_maml_omniglot', 'eager_fail_to_run'], ['maml_omniglot', 'eager_fail_to_run']]
Real failed models: 6 [['detectron2_fasterrcnn_r_50_dc5', 'eager_1st_run_OOM'], ['functorch_maml_omniglot', 'eager_fail_to_run'], ['detectron2_fasterrcnn_r_101_c4', 'eager_1st_run_OOM'], ['detectron2_fasterrcnn_r_50_c4', 'eager_1st_run_OOM'], ['maml_omniglot', 'eager_fail_to_run'], ['detectron2_fasterrcnn_r_101_dc5', 'eager_1st_run_OOM']]
Real failed models: 2 [['functorch_maml_omniglot', 'eager_fail_to_run'], ['maml_omniglot', 'eager_fail_to_run']]
Real failed models: 2 [['functorch_maml_omniglot', 'eager_fail_to_run'], ['maml_omniglot', 'eager_fail_to_run']]
Real failed models: 2 [['functorch_maml_omniglot', 'eager_fail_to_run'], ['maml_omniglot', 'eager_fail_to_run']]
Real failed models: 2 [['functorch_maml_omniglot', 'eager_fail_to_run'], ['maml_omniglot', 'eager_fail_to_run']]
Real failed models: 2 [['functorch_maml_omniglot', 'eager_fail_to_run'], ['maml_omniglot', 'eager_fail_to_run']]
ERROR: Found failed models!
```

Error checking:
* `meta-llama/Llama-3.2-1B` and `google/gemma-2-2b` work locally. Problem with token in CI. Fixed in https://github.com/intel/intel-xpu-backend-for-triton/pull/5399.
* `CamemBert` is not supposed to work according to https://github.com/intel/torch-xpu-ops/commit/779f89911779b8c7296aaec3cf74945c18acc270. Can be ignored.
* `openai/whisper-tiny (fail_accuracy)`. There **might be a problem with the Triton**, but it's a new model and hasn't been tested before (I checked it here: https://github.com/intel/torch-xpu-ops/actions/runs/17338511026/job/49263018812), **so it's not a regression and a blocker**.
* `sebotnet33ts_256 (fail_accuracy)`. There **might be a problem with the Triton**. However, the problem was already present in the previous iteration of validation: https://github.com/intel/torch-xpu-ops/actions/runs/17338511026/job/49263026473#step:18:7802, **so it's not a regression and a blocker**.
* `functorch_maml_omniglot` and `maml_omniglot` are fixed in https://github.com/intel/intel-xpu-backend-for-triton/pull/5398. There was a problem in the environment.
* `convit_base`. The error looks like this and is not related to Triton:
```txt lib/python3.10/site-packages/torch/nn/modules/linear.py", line 134, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: expected mat1 and mat2 to have the same dtype, but got: float != c10::BFloat16
```
* `detectron2_fasterrcnn_r_50_fpn` the problem was already present in the previous iteration of validation: https://github.com/intel/torch-xpu-ops/actions/runs/17329304896/job/49900834883#step:14:41819
* Other Detectron models `(eager_1st_run_OOM)` worked before, the cause of the breakdown is unknown.

Looks like we're ready.

PR in PyTorch: https://github.com/pytorch/pytorch/pull/166436 (to double check)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Triton pin update in PyTorch #5407

Sub-issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Triton pin update in PyTorch #5407

Description

Sub-issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions