Skip to content

resolver caching is overly aggressive #766

@dhellmann

Description

@dhellmann

There is code in the resolver module to cache package version information for build requirements. That logic is leading to us resolving the wrong versions of packages during the torch build for ROCm, so that we build the wheel with numpy 1.26.4 instead of a 2.x version.

The bootstrap process encounters 2 different numpy requirements specifiers:

numpy: bootstrapping numpy<2 as build-system dependency of [(<RequirementType.BUILD_SYSTEM: 'build-system'>, <Requirement('aotriton')>,

then later

numpy: bootstrapping numpy as build-system dependency of [(<RequirementType.TOP_LEVEL: 'toplevel'>, <Requirement('torch==2.7.1')>,

Because of the order of those operations, the cache in the provider in resolver.py only includes numpy versions < 2.0. That prevents us from using a 2.x version in the second operation, and so we build torch itself with the wrong version of numpy that is not compatible with the installation requirements when we install numpy 2.x.

The cache code looks at the requirement type that is being resolved and the cache is only used for build requirements. I don't remember why we made that choice, and it seems odd.

It seems like we should be caching all of the candidates that we discover on pypi.org, and using the cached candidate list to resolve each requirements specifier. That may result in us using more versions of packages, but if there are requirements that we build libraries together with the same version of something and the requirements specifiers don't ensure that we can apply constraints to the build to enforce it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions