Skip to content

Test suite sometimes picks up way more tests; takes ~6 times longer #343

@h-vetinari

Description

@h-vetinari

There's something strange with the tests; in the following run for linux-64+CUDA+MKL (d04bba8 in #340), py311 collects a whole bunch more tests and takes longer than elsewhere (even than py312, which is the only version where we run the inductor tests).

TEST START: /home/conda/feedstock_root/build_artifacts/linux-64/pytorch-2.5.1-cuda126_mkl_py311_h3846359_311.conda
+ python -m pytest -n 2 test/test_autograd.py test/test_autograd_fallback.py test/test_custom_ops.py test/test_linalg.py test/test_mkldnn.py test/test_modules.py test/test_nn.py test/test_torch.py test/test_xnnpack_integration.py -k 'not ((TestTorch and test_print) or (TestAutograd and test_profiler_seq_nr) or (TestAutograd and test_profiler_propagation) or test_mutable_custom_op_fixed_layout or test_BCELoss_weights_no_reduce_cuda or test_ctc_loss_cudnn_tensor_cuda  or (TestTorch and test_index_add_correctness) or test_sdpa_inference_mode_aot_compile or (TestNN and test_grid_sample) or test_indirect_device_assert or (GPUTests and test_scatter_reduce2) or (TestLinalgCPU and test_inverse_errors_large_cpu) or test_base_does_not_require_grad_mode_nothing or test_base_does_not_require_grad_mode_warn or test_composite_registered_to_cpu_mode_nothing)' -m 'not hypothesis' --durations=50
= 13171 passed, 2586 skipped, 91 xfailed, 143216 warnings in 2916.32s (0:48:36) =
TEST END: /home/conda/feedstock_root/build_artifacts/linux-64/pytorch-2.5.1-cuda126_mkl_py311_h3846359_311.conda
TEST START: /home/conda/feedstock_root/build_artifacts/linux-64/pytorch-2.5.1-cuda126_mkl_py39_hdffab68_311.conda
+ python -m pytest -n 2 test/test_autograd.py test/test_autograd_fallback.py test/test_custom_ops.py test/test_linalg.py test/test_mkldnn.py test/test_modules.py test/test_nn.py test/test_torch.py test/test_xnnpack_integration.py -k 'not ((TestTorch and test_print) or (TestAutograd and test_profiler_seq_nr) or (TestAutograd and test_profiler_propagation) or test_mutable_custom_op_fixed_layout or test_BCELoss_weights_no_reduce_cuda or test_ctc_loss_cudnn_tensor_cuda  or (TestTorch and test_index_add_correctness) or test_sdpa_inference_mode_aot_compile or (TestNN and test_grid_sample) or test_indirect_device_assert or (GPUTests and test_scatter_reduce2) or (TestLinalgCPU and test_inverse_errors_large_cpu) or test_base_does_not_require_grad_mode_nothing or test_base_does_not_require_grad_mode_warn or test_composite_registered_to_cpu_mode_nothing)' -m 'not hypothesis' --durations=50
== 7552 passed, 1375 skipped, 31 xfailed, 75701 warnings in 458.74s (0:07:38) ==
TEST END: /home/conda/feedstock_root/build_artifacts/linux-64/pytorch-2.5.1-cuda126_mkl_py39_hdffab68_311.conda
TEST START: /home/conda/feedstock_root/build_artifacts/linux-64/pytorch-2.5.1-cuda126_mkl_py313_h33c0e77_311.conda
+ python -m pytest -n 2 test/test_autograd.py test/test_autograd_fallback.py test/test_custom_ops.py test/test_linalg.py test/test_mkldnn.py test/test_modules.py test/test_nn.py test/test_torch.py test/test_xnnpack_integration.py -k 'not ((TestTorch and test_print) or (TestCustomOp and test_data_dependent_compile) or (TestCustomOp and test_functionalize_error) or (TestCustomOpAPI and test_compile) or (TestCustomOpAPI and test_fake) or test_compile_int4_mm or test_compile_int8_mm or (TestAutograd and test_profiler_seq_nr) or (TestAutograd and test_profiler_propagation) or test_mutable_custom_op_fixed_layout or test_BCELoss_weights_no_reduce_cuda or test_ctc_loss_cudnn_tensor_cuda  or (TestTorch and test_index_add_correctness) or test_sdpa_inference_mode_aot_compile or (TestNN and test_grid_sample) or test_indirect_device_assert or (GPUTests and test_scatter_reduce2) or (TestLinalgCPU and test_inverse_errors_large_cpu) or test_base_does_not_require_grad_mode_nothing or test_base_does_not_require_grad_mode_warn or test_composite_registered_to_cpu_mode_nothing)' -m 'not hypothesis' --durations=50
== 7532 passed, 1375 skipped, 31 xfailed, 75718 warnings in 455.21s (0:07:35) ==
TEST END: /home/conda/feedstock_root/build_artifacts/linux-64/pytorch-2.5.1-cuda126_mkl_py313_h33c0e77_311.conda
TEST START: /home/conda/feedstock_root/build_artifacts/linux-64/pytorch-2.5.1-cuda126_mkl_py310_hca309f4_311.conda
+ python -m pytest -n 2 test/test_autograd.py test/test_autograd_fallback.py test/test_custom_ops.py test/test_linalg.py test/test_mkldnn.py test/test_modules.py test/test_nn.py test/test_torch.py test/test_xnnpack_integration.py -k 'not ((TestTorch and test_print) or (TestAutograd and test_profiler_seq_nr) or (TestAutograd and test_profiler_propagation) or test_mutable_custom_op_fixed_layout or test_BCELoss_weights_no_reduce_cuda or test_ctc_loss_cudnn_tensor_cuda  or (TestTorch and test_index_add_correctness) or test_sdpa_inference_mode_aot_compile or (TestNN and test_grid_sample) or test_indirect_device_assert or (GPUTests and test_scatter_reduce2) or (TestLinalgCPU and test_inverse_errors_large_cpu) or test_base_does_not_require_grad_mode_nothing or test_base_does_not_require_grad_mode_warn or test_composite_registered_to_cpu_mode_nothing)' -m 'not hypothesis' --durations=50
== 7552 passed, 1375 skipped, 31 xfailed, 75701 warnings in 459.08s (0:07:39) ==
TEST END: /home/conda/feedstock_root/build_artifacts/linux-64/pytorch-2.5.1-cuda126_mkl_py310_hca309f4_311.conda
TEST START: /home/conda/feedstock_root/build_artifacts/linux-64/pytorch-2.5.1-cuda126_mkl_py312_hdbe889e_311.conda
+ python -m pytest -n 2 test/test_autograd.py test/test_autograd_fallback.py test/test_custom_ops.py test/test_linalg.py test/test_mkldnn.py test/test_modules.py test/test_nn.py test/test_torch.py test/test_xnnpack_integration.py test/inductor/test_torchinductor.py -k 'not ((TestTorch and test_print) or (TestAutograd and test_profiler_seq_nr) or (TestAutograd and test_profiler_propagation) or test_mutable_custom_op_fixed_layout or test_BCELoss_weights_no_reduce_cuda or test_ctc_loss_cudnn_tensor_cuda  or (TestTorch and test_index_add_correctness) or test_sdpa_inference_mode_aot_compile or (TestNN and test_grid_sample) or test_indirect_device_assert or (GPUTests and test_scatter_reduce2) or (TestLinalgCPU and test_inverse_errors_large_cpu) or test_base_does_not_require_grad_mode_nothing or test_base_does_not_require_grad_mode_warn or test_composite_registered_to_cpu_mode_nothing)' -m 'not hypothesis' --durations=50
= 8196 passed, 1429 skipped, 31 xfailed, 76339 warnings in 2177.80s (0:36:17) ==
TEST END: /home/conda/feedstock_root/build_artifacts/linux-64/pytorch-2.5.1-cuda126_mkl_py312_hdbe889e_311.conda

The set of modules and skips is exactly the same as on 3.9 or 3.10, so I don't know what would explain this difference in test collection.

(note: it's expected that 3.12 runs longer due to being the only version where we include the torchinductor tests, and that 3.13 has more skips because dynamo doesn't yet support 3.13 in pytorch 2.5)

However, after merging #340 to main, the exact same job yielded completely different behaviour, with every test run collecting 13k+ tests and taking ~50min instead of <10.

TEST START: /home/conda/feedstock_root/build_artifacts/linux-64/pytorch-2.5.1-cuda126_mkl_py313_h33c0e77_311.conda
+ python -m pytest -n 2 test/test_autograd.py test/test_autograd_fallback.py test/test_custom_ops.py test/test_linalg.py test/test_mkldnn.py test/test_modules.py test/test_nn.py test/test_torch.py test/test_xnnpack_integration.py -k 'not ((TestTorch and test_print) or (TestCustomOp and test_data_dependent_compile) or (TestCustomOp and test_functionalize_error) or (TestCustomOpAPI and test_compile) or (TestCustomOpAPI and test_fake) or test_compile_int4_mm or test_compile_int8_mm or (TestAutograd and test_profiler_seq_nr) or (TestAutograd and test_profiler_propagation) or test_mutable_custom_op_fixed_layout or test_BCELoss_weights_no_reduce_cuda or test_ctc_loss_cudnn_tensor_cuda  or (TestTorch and test_index_add_correctness) or test_sdpa_inference_mode_aot_compile or (TestNN and test_grid_sample) or test_indirect_device_assert or (GPUTests and test_scatter_reduce2) or (TestLinalgCPU and test_inverse_errors_large_cpu) or test_base_does_not_require_grad_mode_nothing or test_base_does_not_require_grad_mode_warn or test_composite_registered_to_cpu_mode_nothing)' -m 'not hypothesis' --durations=50
= 13151 passed, 2570 skipped, 91 xfailed, 143235 warnings in 2899.57s (0:48:19) =
TEST END: /home/conda/feedstock_root/build_artifacts/linux-64/pytorch-2.5.1-cuda126_mkl_py313_h33c0e77_311.conda
TEST START: /home/conda/feedstock_root/build_artifacts/linux-64/pytorch-2.5.1-cuda126_mkl_py311_h3846359_311.conda
+ python -m pytest -n 2 test/test_autograd.py test/test_autograd_fallback.py test/test_custom_ops.py test/test_linalg.py test/test_mkldnn.py test/test_modules.py test/test_nn.py test/test_torch.py test/test_xnnpack_integration.py -k 'not ((TestTorch and test_print) or (TestAutograd and test_profiler_seq_nr) or (TestAutograd and test_profiler_propagation) or test_mutable_custom_op_fixed_layout or test_BCELoss_weights_no_reduce_cuda or test_ctc_loss_cudnn_tensor_cuda  or (TestTorch and test_index_add_correctness) or test_sdpa_inference_mode_aot_compile or (TestNN and test_grid_sample) or test_indirect_device_assert or (GPUTests and test_scatter_reduce2) or (TestLinalgCPU and test_inverse_errors_large_cpu) or test_base_does_not_require_grad_mode_nothing or test_base_does_not_require_grad_mode_warn or test_composite_registered_to_cpu_mode_nothing)' -m 'not hypothesis' --durations=50
= 13171 passed, 2586 skipped, 91 xfailed, 143216 warnings in 2898.41s (0:48:18) =
TEST END: /home/conda/feedstock_root/build_artifacts/linux-64/pytorch-2.5.1-cuda126_mkl_py311_h3846359_311.conda
TEST START: /home/conda/feedstock_root/build_artifacts/linux-64/pytorch-2.5.1-cuda126_mkl_py310_hca309f4_311.conda
+ python -m pytest -n 2 test/test_autograd.py test/test_autograd_fallback.py test/test_custom_ops.py test/test_linalg.py test/test_mkldnn.py test/test_modules.py test/test_nn.py test/test_torch.py test/test_xnnpack_integration.py -k 'not ((TestTorch and test_print) or (TestAutograd and test_profiler_seq_nr) or (TestAutograd and test_profiler_propagation) or test_mutable_custom_op_fixed_layout or test_BCELoss_weights_no_reduce_cuda or test_ctc_loss_cudnn_tensor_cuda  or (TestTorch and test_index_add_correctness) or test_sdpa_inference_mode_aot_compile or (TestNN and test_grid_sample) or test_indirect_device_assert or (GPUTests and test_scatter_reduce2) or (TestLinalgCPU and test_inverse_errors_large_cpu) or test_base_does_not_require_grad_mode_nothing or test_base_does_not_require_grad_mode_warn or test_composite_registered_to_cpu_mode_nothing)' -m 'not hypothesis' --durations=50
= 13171 passed, 2586 skipped, 91 xfailed, 143216 warnings in 2950.04s (0:49:10) =
TEST END: /home/conda/feedstock_root/build_artifacts/linux-64/pytorch-2.5.1-cuda126_mkl_py310_hca309f4_311.conda
TEST START: /home/conda/feedstock_root/build_artifacts/linux-64/pytorch-2.5.1-cuda126_mkl_py312_hdbe889e_311.conda
+ python -m pytest -n 2 test/test_autograd.py test/test_autograd_fallback.py test/test_custom_ops.py test/test_linalg.py test/test_mkldnn.py test/test_modules.py test/test_nn.py test/test_torch.py test/test_xnnpack_integration.py test/inductor/test_torchinductor.py -k 'not ((TestTorch and test_print) or (TestAutograd and test_profiler_seq_nr) or (TestAutograd and test_profiler_propagation) or test_mutable_custom_op_fixed_layout or test_BCELoss_weights_no_reduce_cuda or test_ctc_loss_cudnn_tensor_cuda  or (TestTorch and test_index_add_correctness) or test_sdpa_inference_mode_aot_compile or (TestNN and test_grid_sample) or test_indirect_device_assert or (GPUTests and test_scatter_reduce2) or (TestLinalgCPU and test_inverse_errors_large_cpu) or test_base_does_not_require_grad_mode_nothing or test_base_does_not_require_grad_mode_warn or test_composite_registered_to_cpu_mode_nothing)' -m 'not hypothesis' --durations=50
= 1 failed, 14514 passed, 2663 skipped, 91 xfailed, 143956 warnings in 4192.22s (1:09:52) =
# 3.9 not run after flaky failure for 3.12

Originally posted by @h-vetinari in #340 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions