Skip to content

Conversation

@mgorny
Copy link
Contributor

@mgorny mgorny commented Aug 7, 2025

Checklist

  • Used a personal fork of the feedstock to propose changes
  • Bumped the build number (if the version is unchanged)
  • Reset the build number to 0 (if the version changed)
  • Re-rendered with the latest conda-smithy (Use the phrase @conda-forge-admin, please rerender in a comment in this PR for automated rerendering)
  • Ensured the license file is being packaged.

Signed-off-by: Michał Górny <[email protected]>
@conda-forge-admin
Copy link
Contributor

conda-forge-admin commented Aug 7, 2025

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.

I do have some suggestions for making it better though...

For recipe/meta.yaml:

  • ℹ️ The magma output has been superseded by libmagma-devel.
  • ℹ️ The recipe is not parsable by parser conda-souschef (grayskull). This parser is not currently used by conda-forge, but may be in the future. We are collecting information to see which recipes are compatible with grayskull.
  • ℹ️ The recipe is not parsable by parser conda-recipe-manager. The recipe can only be automatically migrated to the new v1 format if it is parseable by conda-recipe-manager.

This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/17876818413. Examine the logs at this URL for more detail.

@mgorny
Copy link
Contributor Author

mgorny commented Aug 8, 2025

Ok, so issues so far:

  1. Missing pyyaml test dependency.
  2. A few tests are segfaulting in CUDA builds.
  3. Windows can't find Python:
    Could NOT find Python3: Found unsuitable major version ".=", but required
    major version is exact version "3"
    

@h-vetinari
Copy link
Member

The release notes make it sound like we'll need to double-check nvtx support as well

A downstream project using -DUSE_SYSTEM_NVTX will not be able to find NVTX3 or torch::nvtx3 via PyTorch's cmake/public/cuda.cmake. The downstream project now needs to explicitly find NVTX3 and torch::nvtx3 by implementing the same logic in PyTorch's cmake/Dependences.cmake.

@mgorny
Copy link
Contributor Author

mgorny commented Aug 11, 2025

The release notes make it sound like we'll need to double-check nvtx support as well

A downstream project using -DUSE_SYSTEM_NVTX will not be able to find NVTX3 or torch::nvtx3 via PyTorch's cmake/public/cuda.cmake. The downstream project now needs to explicitly find NVTX3 and torch::nvtx3 by implementing the same logic in PyTorch's cmake/Dependences.cmake.

From what I understand, this means checking reverse dependencies.

@mgorny

This comment was marked as outdated.

@mgorny

This comment was marked as outdated.

@mgorny
Copy link
Contributor Author

mgorny commented Aug 11, 2025

Wait, I'm reading the output wrong. Investigating further.

@mgorny
Copy link
Contributor Author

mgorny commented Aug 11, 2025

Okay, I've learned more about Windows shell than I wanted to know, and I suspect delayed expansion did no work as expected. I've tried replacing %PY_VERSION_FULL% with !PY_VERSION_FULL!, and apparently this one works at least in Wine's implementation of cmd.

@h-vetinari
Copy link
Member

Haha, the win+CUDA build manages to blow through the disk space

error: [Errno 28] No space left on device

despite using the largest runner alread

- cirun-azure-windows-4xlarge # [win]

But perhaps the "size" of that runner is only measured in CPUs/RAM, not storage? Could we extend that @aktech @wolfv?

@mgorny
Copy link
Contributor Author

mgorny commented Aug 12, 2025

Well, at least the Python version issue was fixed. Also, looks like my idea of using pytest --forked won't work for CUDA tests.

@mgorny
Copy link
Contributor Author

mgorny commented Aug 13, 2025

Le sigh, I've fetched the artifact and couldn't reproduce the segfaults locally. But I've noticed that our openblas+openmp constraint didn't work anymore. Let's try again.

By the way, I'm wondering if we should perhaps skip CPU tests in CUDA builds. I see they're some of the longest tests in CI, and I suppose it's sufficient that we test them in CPU builds.

@h-vetinari
Copy link
Member

But I've noticed that our openblas+openmp constraint didn't work anymore.

Probably related to #407, which unfortunately isn't green either.

@mgorny mgorny force-pushed the v2.8.0 branch 2 times, most recently from 8e57db5 to 9ce5214 Compare September 15, 2025 13:46
@mgorny
Copy link
Contributor Author

mgorny commented Sep 15, 2025

Uh, so I guess CMake 4 breaks AArch64 builds? I'll try debugging that locally.

@mgorny
Copy link
Contributor Author

mgorny commented Sep 15, 2025

BTW:

-------------------------------------------------------------------------------------------------
|                                                                                               |
|            WARNING: we strongly recommend enabling linker script optimization for ARM + CUDA. |
|            To do so please export USE_PRIORITIZED_TEXT_FOR_LD=1                               |
|                                                                                               |
-------------------------------------------------------------------------------------------------

Should we do that?

@hmaarrfk
Copy link
Contributor

I thought we had a workaround with a comment in our script for this

@h-vetinari
Copy link
Member

Windows+CUDA failing with

error: could not write to 'build\bdist.win-amd64\wheel\.\torch\lib\XNNPACK.lib': No space left on device

@aktech @wolfv, could we increase disk space on the windows agents?

@h-vetinari
Copy link
Member

linux-64 has a single test failure that looks like a minor tolerance violation

2025-09-16T06:05:47.2890235Z =================================== FAILURES ===================================
2025-09-16T06:05:47.2891936Z _____________________ TestNN.test_layer_norm_backwards_eps _____________________
2025-09-16T06:05:47.2893254Z [gw0] linux -- Python 3.10.18 $PREFIX/bin/python3.10
2025-09-16T06:05:47.2893833Z 
2025-09-16T06:05:47.2894737Z self = <test_nn.TestNN testMethod=test_layer_norm_backwards_eps>
2025-09-16T06:05:47.2895403Z 
2025-09-16T06:05:47.2895755Z     @unittest.skipIf(not TEST_CUDA, "CUDA not available")
2025-09-16T06:05:47.2896595Z     def test_layer_norm_backwards_eps(self):
2025-09-16T06:05:47.2897281Z         dtype = torch.float
2025-09-16T06:05:47.2897894Z         m_x_n_list = [(3, 3), (5, 5), (11, 11), (55, 55),
2025-09-16T06:05:47.2898602Z                       (32, 32), (1024, 32), (1024, 1024),
2025-09-16T06:05:47.2899261Z                       (33, 33), (1025, 33), (1025, 1025),
2025-09-16T06:05:47.2899924Z                       (128 * 1024, 32), (32, 128 * 1024)]
2025-09-16T06:05:47.2900577Z         boolean = [True, False]
2025-09-16T06:05:47.2901311Z         combinations = itertools.product(boolean, repeat=2)
2025-09-16T06:05:47.2902177Z         for elementwise_affine, bias in combinations:
2025-09-16T06:05:47.2903096Z             for m, n in m_x_n_list:
2025-09-16T06:05:47.2903887Z                 x = torch.randn((m, n), dtype=dtype, requires_grad=True)
2025-09-16T06:05:47.2904716Z                 grad_output = torch.rand_like(x)
2025-09-16T06:05:47.2905526Z                 x_cuda = x.clone().detach().to("cuda").requires_grad_()
2025-09-16T06:05:47.2906461Z                 grad_output_cuda = grad_output.clone().detach().to("cuda")
2025-09-16T06:05:47.2907629Z                 ln = nn.LayerNorm(n, dtype=dtype, elementwise_affine=elementwise_affine, bias=bias)
2025-09-16T06:05:47.2909298Z                 ln_cuda = nn.LayerNorm(n, device="cuda", dtype=dtype, elementwise_affine=elementwise_affine, bias=bias)
2025-09-16T06:05:47.2910513Z                 ln_out = ln(x)
2025-09-16T06:05:47.2911121Z                 ln_out_cuda = ln_cuda(x_cuda)
2025-09-16T06:05:47.2911819Z                 ln_out.backward(grad_output)
2025-09-16T06:05:47.2912699Z                 ln_out_cuda.backward(grad_output_cuda)
2025-09-16T06:05:47.2913429Z                 if elementwise_affine:
2025-09-16T06:05:47.2914663Z >                   self.assertEqual(ln.weight.grad, ln_cuda.weight.grad, f"weight grad failed: {m=} {n=}", rtol=1e-4, atol=1e-4)
2025-09-16T06:05:47.2915985Z E                   AssertionError: Tensor-likes are not close!
2025-09-16T06:05:47.2916713Z E                   
2025-09-16T06:05:47.2917262Z E                   Mismatched elements: 1 / 32 (3.1%)
2025-09-16T06:05:47.2918340Z E                   Greatest absolute difference: 0.00084686279296875 at index (0,) (up to 0.0001 allowed)
2025-09-16T06:05:47.2919781Z E                   Greatest relative difference: 0.0012425975874066353 at index (0,) (up to 0.0001 allowed)
2025-09-16T06:05:47.2920868Z E                   weight grad failed: m=131072 n=32
2025-09-16T06:05:47.2921525Z E                   
2025-09-16T06:05:47.2922239Z E                   To execute this test, run the following from the base repo dir:
2025-09-16T06:05:47.2923340Z E                       python test/test_nn.py TestNN.test_layer_norm_backwards_eps
2025-09-16T06:05:47.2924306Z E                   
2025-09-16T06:05:47.2925119Z E                   This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-09-16T06:05:47.2925877Z 
2025-09-16T06:05:47.2926123Z test/test_nn.py:7238: AssertionError
2025-09-16T06:05:47.5812833Z ============================= slowest 50 durations =============================

@mgorny
Copy link
Contributor Author

mgorny commented Sep 16, 2025

I thought we had a workaround with a comment in our script for this

Ah, sorry, indeed, I was doing a non-CUDA build and didn't notice it's there for CUDA.

@aktech
Copy link

aktech commented Sep 16, 2025

@aktech @wolfv, could we increase disk space on the windows agents?

https://docs.cirun.io/reference/yaml#custom-disk-size-for-azure

The windows runners configuration needs to be updated with:

    extra_config:
      storageProfile:
        osDisk:
          diskSizeGB: 512

…5.09.18.12.55.27

Other tools:
- conda-build 25.7.0
- rattler-build 0.47.0
- rattler-build-conda-compat 1.4.6
Signed-off-by: Michał Górny <[email protected]>
Thanks to @aktech for the suggestion.

Signed-off-by: Michał Górny <[email protected]>
@mgorny
Copy link
Contributor Author

mgorny commented Sep 20, 2025

Uh, I accidentally rerendered over the Windows fix 🤦.

Let me read #413 up, in case I should change something before restarting.

Copy link
Member

@h-vetinari h-vetinari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the persistence on this one! ❤️

Discussion about the pybind situation in #413 is still ongoing with the pybind maintainers, so a rebuild for v3 (or removing the dependence on pybind-abi completely) can be done in a follow-up.

@mgorny
Copy link
Contributor Author

mgorny commented Sep 21, 2025

Thanks a lot for the persistence on this one! ❤️

No problem. I'm sorry it took this long — I have made more mistakes than i should have, notably failed to pin run dependencies early on, which would have saved me a lot of subsequent testing.

There's also the open question on how to deal with cudnn. I'm not even sure if this is something to report to PyTorch or to NVIDIA.

@RoyiAvital
Copy link

@mgorny , Appreciate your effort here for us the PyTorch on Windows users.
I hope the next ones are much easier (PyTorch 2.9 is one month away) to build.

@h-vetinari h-vetinari merged commit 034ea64 into conda-forge:main Sep 21, 2025
31 of 32 checks passed
Copy link
Member

@h-vetinari h-vetinari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also the open question on how to deal with cudnn. I'm not even sure if this is something to report to PyTorch or to NVIDIA.

I think we could start at least with an issue on the cudnn feedstock, at least write down the things you remember from that debugging session somewhere, before it becomes just a haze. 😅

@h-vetinari
Copy link
Member

Windows CUDA builds are still failing to upload, done manually from the artefacts:

$ gh run download 17899219922 --repo conda-forge/pytorch-cpu-feedstock --name conda_artifacts_17899219922_win_64_channel_targetsconda-forge_maincu_hca575dce
$ unzip pytorch-cpu-feedstock_conda_artifacts_.zip
$ cd bld/win-64
$ rm current_repodata.json index.html repodata*
$ ls
libtorch-2.8.0-cuda128_mkl_ha34d6f4_300.conda       pytorch-2.8.0-cuda128_mkl_py312_h0850830_300.conda
pytorch-2.8.0-cuda128_mkl_py310_h0b8c608_300.conda  pytorch-2.8.0-cuda128_mkl_py313_hf206996_300.conda
pytorch-2.8.0-cuda128_mkl_py311_hd9a8a8a_300.conda  pytorch-gpu-2.8.0-cuda128_mkl_h2fd0c33_300.conda
$ ls | xargs anaconda upload
$ DELEGATE=h-vetinari
PACKAGE_VERSION=2.8.0
for package in libtorch pytorch pytorch-gpu; do
  anaconda copy --from-label main --to-label main --to-owner conda-forge ${DELEGATE}/${package}/${PACKAGE_VERSION}
done

@mgorny
Copy link
Contributor Author

mgorny commented Sep 22, 2025

Presumably we'll need to restart that one failed AArch64 build — but I don't see the rerun button right now, so I guess it'll only appear when the other job is finished. Not sure if I can rerun it without rerunning the Windows build though.

@h-vetinari
Copy link
Member

Not sure if I can rerun it without rerunning the Windows build though.

Just don't use the run-wide restart. It's possible to restart a single job

@Zalnd
Copy link

Zalnd commented Sep 23, 2025

Hi all, I'm unsure if this is the right place to report this, but there's an issue with the file pytorch-2.8.0-cpu_mkl_py311_h98f00f5_100.conda in this release.

InvalidArchiveError("Error with archive C:\Anaconda\pkgs\pytorch-2.8.0-cpu_mkl_py311_h98f00f5_100.conda. You probably need to delete and re-download or re-create this file. Message was:\n\nfailed with erro)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants