Test suite sometimes picks up way more tests; takes ~6 times longer

There's something strange with the tests; in the following [run](https://github.com/conda-forge/pytorch-cpu-feedstock/actions/runs/13080208806/job/36501842751) for linux-64+CUDA+MKL (d04bba891c9aee352136b8257c09ca8965f82718 in #340), py311 collects a whole bunch more tests and takes longer than elsewhere (even than py312, which is the only version where we run the inductor tests).

```
TEST START: /home/conda/feedstock_root/build_artifacts/linux-64/pytorch-2.5.1-cuda126_mkl_py311_h3846359_311.conda
+ python -m pytest -n 2 test/test_autograd.py test/test_autograd_fallback.py test/test_custom_ops.py test/test_linalg.py test/test_mkldnn.py test/test_modules.py test/test_nn.py test/test_torch.py test/test_xnnpack_integration.py -k 'not ((TestTorch and test_print) or (TestAutograd and test_profiler_seq_nr) or (TestAutograd and test_profiler_propagation) or test_mutable_custom_op_fixed_layout or test_BCELoss_weights_no_reduce_cuda or test_ctc_loss_cudnn_tensor_cuda  or (TestTorch and test_index_add_correctness) or test_sdpa_inference_mode_aot_compile or (TestNN and test_grid_sample) or test_indirect_device_assert or (GPUTests and test_scatter_reduce2) or (TestLinalgCPU and test_inverse_errors_large_cpu) or test_base_does_not_require_grad_mode_nothing or test_base_does_not_require_grad_mode_warn or test_composite_registered_to_cpu_mode_nothing)' -m 'not hypothesis' --durations=50
= 13171 passed, 2586 skipped, 91 xfailed, 143216 warnings in 2916.32s (0:48:36) =
TEST END: /home/conda/feedstock_root/build_artifacts/linux-64/pytorch-2.5.1-cuda126_mkl_py311_h3846359_311.conda
TEST START: /home/conda/feedstock_root/build_artifacts/linux-64/pytorch-2.5.1-cuda126_mkl_py39_hdffab68_311.conda
+ python -m pytest -n 2 test/test_autograd.py test/test_autograd_fallback.py test/test_custom_ops.py test/test_linalg.py test/test_mkldnn.py test/test_modules.py test/test_nn.py test/test_torch.py test/test_xnnpack_integration.py -k 'not ((TestTorch and test_print) or (TestAutograd and test_profiler_seq_nr) or (TestAutograd and test_profiler_propagation) or test_mutable_custom_op_fixed_layout or test_BCELoss_weights_no_reduce_cuda or test_ctc_loss_cudnn_tensor_cuda  or (TestTorch and test_index_add_correctness) or test_sdpa_inference_mode_aot_compile or (TestNN and test_grid_sample) or test_indirect_device_assert or (GPUTests and test_scatter_reduce2) or (TestLinalgCPU and test_inverse_errors_large_cpu) or test_base_does_not_require_grad_mode_nothing or test_base_does_not_require_grad_mode_warn or test_composite_registered_to_cpu_mode_nothing)' -m 'not hypothesis' --durations=50
== 7552 passed, 1375 skipped, 31 xfailed, 75701 warnings in 458.74s (0:07:38) ==
TEST END: /home/conda/feedstock_root/build_artifacts/linux-64/pytorch-2.5.1-cuda126_mkl_py39_hdffab68_311.conda
TEST START: /home/conda/feedstock_root/build_artifacts/linux-64/pytorch-2.5.1-cuda126_mkl_py313_h33c0e77_311.conda
+ python -m pytest -n 2 test/test_autograd.py test/test_autograd_fallback.py test/test_custom_ops.py test/test_linalg.py test/test_mkldnn.py test/test_modules.py test/test_nn.py test/test_torch.py test/test_xnnpack_integration.py -k 'not ((TestTorch and test_print) or (TestCustomOp and test_data_dependent_compile) or (TestCustomOp and test_functionalize_error) or (TestCustomOpAPI and test_compile) or (TestCustomOpAPI and test_fake) or test_compile_int4_mm or test_compile_int8_mm or (TestAutograd and test_profiler_seq_nr) or (TestAutograd and test_profiler_propagation) or test_mutable_custom_op_fixed_layout or test_BCELoss_weights_no_reduce_cuda or test_ctc_loss_cudnn_tensor_cuda  or (TestTorch and test_index_add_correctness) or test_sdpa_inference_mode_aot_compile or (TestNN and test_grid_sample) or test_indirect_device_assert or (GPUTests and test_scatter_reduce2) or (TestLinalgCPU and test_inverse_errors_large_cpu) or test_base_does_not_require_grad_mode_nothing or test_base_does_not_require_grad_mode_warn or test_composite_registered_to_cpu_mode_nothing)' -m 'not hypothesis' --durations=50
== 7532 passed, 1375 skipped, 31 xfailed, 75718 warnings in 455.21s (0:07:35) ==
TEST END: /home/conda/feedstock_root/build_artifacts/linux-64/pytorch-2.5.1-cuda126_mkl_py313_h33c0e77_311.conda
TEST START: /home/conda/feedstock_root/build_artifacts/linux-64/pytorch-2.5.1-cuda126_mkl_py310_hca309f4_311.conda
+ python -m pytest -n 2 test/test_autograd.py test/test_autograd_fallback.py test/test_custom_ops.py test/test_linalg.py test/test_mkldnn.py test/test_modules.py test/test_nn.py test/test_torch.py test/test_xnnpack_integration.py -k 'not ((TestTorch and test_print) or (TestAutograd and test_profiler_seq_nr) or (TestAutograd and test_profiler_propagation) or test_mutable_custom_op_fixed_layout or test_BCELoss_weights_no_reduce_cuda or test_ctc_loss_cudnn_tensor_cuda  or (TestTorch and test_index_add_correctness) or test_sdpa_inference_mode_aot_compile or (TestNN and test_grid_sample) or test_indirect_device_assert or (GPUTests and test_scatter_reduce2) or (TestLinalgCPU and test_inverse_errors_large_cpu) or test_base_does_not_require_grad_mode_nothing or test_base_does_not_require_grad_mode_warn or test_composite_registered_to_cpu_mode_nothing)' -m 'not hypothesis' --durations=50
== 7552 passed, 1375 skipped, 31 xfailed, 75701 warnings in 459.08s (0:07:39) ==
TEST END: /home/conda/feedstock_root/build_artifacts/linux-64/pytorch-2.5.1-cuda126_mkl_py310_hca309f4_311.conda
TEST START: /home/conda/feedstock_root/build_artifacts/linux-64/pytorch-2.5.1-cuda126_mkl_py312_hdbe889e_311.conda
+ python -m pytest -n 2 test/test_autograd.py test/test_autograd_fallback.py test/test_custom_ops.py test/test_linalg.py test/test_mkldnn.py test/test_modules.py test/test_nn.py test/test_torch.py test/test_xnnpack_integration.py test/inductor/test_torchinductor.py -k 'not ((TestTorch and test_print) or (TestAutograd and test_profiler_seq_nr) or (TestAutograd and test_profiler_propagation) or test_mutable_custom_op_fixed_layout or test_BCELoss_weights_no_reduce_cuda or test_ctc_loss_cudnn_tensor_cuda  or (TestTorch and test_index_add_correctness) or test_sdpa_inference_mode_aot_compile or (TestNN and test_grid_sample) or test_indirect_device_assert or (GPUTests and test_scatter_reduce2) or (TestLinalgCPU and test_inverse_errors_large_cpu) or test_base_does_not_require_grad_mode_nothing or test_base_does_not_require_grad_mode_warn or test_composite_registered_to_cpu_mode_nothing)' -m 'not hypothesis' --durations=50
= 8196 passed, 1429 skipped, 31 xfailed, 76339 warnings in 2177.80s (0:36:17) ==
TEST END: /home/conda/feedstock_root/build_artifacts/linux-64/pytorch-2.5.1-cuda126_mkl_py312_hdbe889e_311.conda
```
The set of modules and skips is exactly the same as on 3.9 or 3.10, so I don't know what would explain this difference in test collection. 

(note: it's expected that 3.12 runs longer due to being the only version where we include the `torchinductor` tests, and that 3.13 has more skips because dynamo doesn't yet support 3.13 in pytorch 2.5)

However, after merging #340 to main, the exact same job yielded completely different behaviour, with _every_ test run collecting 13k+ tests and taking ~50min instead of <10.

```
TEST START: /home/conda/feedstock_root/build_artifacts/linux-64/pytorch-2.5.1-cuda126_mkl_py313_h33c0e77_311.conda
+ python -m pytest -n 2 test/test_autograd.py test/test_autograd_fallback.py test/test_custom_ops.py test/test_linalg.py test/test_mkldnn.py test/test_modules.py test/test_nn.py test/test_torch.py test/test_xnnpack_integration.py -k 'not ((TestTorch and test_print) or (TestCustomOp and test_data_dependent_compile) or (TestCustomOp and test_functionalize_error) or (TestCustomOpAPI and test_compile) or (TestCustomOpAPI and test_fake) or test_compile_int4_mm or test_compile_int8_mm or (TestAutograd and test_profiler_seq_nr) or (TestAutograd and test_profiler_propagation) or test_mutable_custom_op_fixed_layout or test_BCELoss_weights_no_reduce_cuda or test_ctc_loss_cudnn_tensor_cuda  or (TestTorch and test_index_add_correctness) or test_sdpa_inference_mode_aot_compile or (TestNN and test_grid_sample) or test_indirect_device_assert or (GPUTests and test_scatter_reduce2) or (TestLinalgCPU and test_inverse_errors_large_cpu) or test_base_does_not_require_grad_mode_nothing or test_base_does_not_require_grad_mode_warn or test_composite_registered_to_cpu_mode_nothing)' -m 'not hypothesis' --durations=50
= 13151 passed, 2570 skipped, 91 xfailed, 143235 warnings in 2899.57s (0:48:19) =
TEST END: /home/conda/feedstock_root/build_artifacts/linux-64/pytorch-2.5.1-cuda126_mkl_py313_h33c0e77_311.conda
TEST START: /home/conda/feedstock_root/build_artifacts/linux-64/pytorch-2.5.1-cuda126_mkl_py311_h3846359_311.conda
+ python -m pytest -n 2 test/test_autograd.py test/test_autograd_fallback.py test/test_custom_ops.py test/test_linalg.py test/test_mkldnn.py test/test_modules.py test/test_nn.py test/test_torch.py test/test_xnnpack_integration.py -k 'not ((TestTorch and test_print) or (TestAutograd and test_profiler_seq_nr) or (TestAutograd and test_profiler_propagation) or test_mutable_custom_op_fixed_layout or test_BCELoss_weights_no_reduce_cuda or test_ctc_loss_cudnn_tensor_cuda  or (TestTorch and test_index_add_correctness) or test_sdpa_inference_mode_aot_compile or (TestNN and test_grid_sample) or test_indirect_device_assert or (GPUTests and test_scatter_reduce2) or (TestLinalgCPU and test_inverse_errors_large_cpu) or test_base_does_not_require_grad_mode_nothing or test_base_does_not_require_grad_mode_warn or test_composite_registered_to_cpu_mode_nothing)' -m 'not hypothesis' --durations=50
= 13171 passed, 2586 skipped, 91 xfailed, 143216 warnings in 2898.41s (0:48:18) =
TEST END: /home/conda/feedstock_root/build_artifacts/linux-64/pytorch-2.5.1-cuda126_mkl_py311_h3846359_311.conda
TEST START: /home/conda/feedstock_root/build_artifacts/linux-64/pytorch-2.5.1-cuda126_mkl_py310_hca309f4_311.conda
+ python -m pytest -n 2 test/test_autograd.py test/test_autograd_fallback.py test/test_custom_ops.py test/test_linalg.py test/test_mkldnn.py test/test_modules.py test/test_nn.py test/test_torch.py test/test_xnnpack_integration.py -k 'not ((TestTorch and test_print) or (TestAutograd and test_profiler_seq_nr) or (TestAutograd and test_profiler_propagation) or test_mutable_custom_op_fixed_layout or test_BCELoss_weights_no_reduce_cuda or test_ctc_loss_cudnn_tensor_cuda  or (TestTorch and test_index_add_correctness) or test_sdpa_inference_mode_aot_compile or (TestNN and test_grid_sample) or test_indirect_device_assert or (GPUTests and test_scatter_reduce2) or (TestLinalgCPU and test_inverse_errors_large_cpu) or test_base_does_not_require_grad_mode_nothing or test_base_does_not_require_grad_mode_warn or test_composite_registered_to_cpu_mode_nothing)' -m 'not hypothesis' --durations=50
= 13171 passed, 2586 skipped, 91 xfailed, 143216 warnings in 2950.04s (0:49:10) =
TEST END: /home/conda/feedstock_root/build_artifacts/linux-64/pytorch-2.5.1-cuda126_mkl_py310_hca309f4_311.conda
TEST START: /home/conda/feedstock_root/build_artifacts/linux-64/pytorch-2.5.1-cuda126_mkl_py312_hdbe889e_311.conda
+ python -m pytest -n 2 test/test_autograd.py test/test_autograd_fallback.py test/test_custom_ops.py test/test_linalg.py test/test_mkldnn.py test/test_modules.py test/test_nn.py test/test_torch.py test/test_xnnpack_integration.py test/inductor/test_torchinductor.py -k 'not ((TestTorch and test_print) or (TestAutograd and test_profiler_seq_nr) or (TestAutograd and test_profiler_propagation) or test_mutable_custom_op_fixed_layout or test_BCELoss_weights_no_reduce_cuda or test_ctc_loss_cudnn_tensor_cuda  or (TestTorch and test_index_add_correctness) or test_sdpa_inference_mode_aot_compile or (TestNN and test_grid_sample) or test_indirect_device_assert or (GPUTests and test_scatter_reduce2) or (TestLinalgCPU and test_inverse_errors_large_cpu) or test_base_does_not_require_grad_mode_nothing or test_base_does_not_require_grad_mode_warn or test_composite_registered_to_cpu_mode_nothing)' -m 'not hypothesis' --durations=50
= 1 failed, 14514 passed, 2663 skipped, 91 xfailed, 143956 warnings in 4192.22s (1:09:52) =
# 3.9 not run after flaky failure for 3.12
```

_Originally posted by @h-vetinari in https://github.com/conda-forge/pytorch-cpu-feedstock/issues/340#issuecomment-2628837106_
            

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Test suite sometimes picks up way more tests; takes ~6 times longer #343

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Test suite sometimes picks up way more tests; takes ~6 times longer #343

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions