-
-
Notifications
You must be signed in to change notification settings - Fork 631
feat(pip): global downloader settings #3322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -115,9 +115,10 @@ def build_config( | |||||
# extra values that we just add | ||||||
auth_patterns = tag.auth_patterns, | ||||||
netrc = tag.netrc, | ||||||
# TODO @aignas 2025-05-19: add more attr groups: | ||||||
# * for index/downloader config. This includes all of those attributes for | ||||||
# overrides, etc. Index overrides per platform could be also used here. | ||||||
# TODO @aignas 2025-10-05: is this enough? | ||||||
# downloader = tag.downloader, | ||||||
# index_url = tag.index_url, | ||||||
# index_overrides = tag.index_overrides, | ||||||
) | ||||||
|
||||||
return struct( | ||||||
|
@@ -525,7 +526,72 @@ supported from this version without extra handling from the user. | |||||
::: | ||||||
""", | ||||||
), | ||||||
} | AUTH_ATTRS | ||||||
} | AUTH_ATTRS | { | ||||||
# Downloader options | ||||||
"downloader": attr.string( | ||||||
values = ["disabled", "debug", "enabled", "auto"], | ||||||
default = "auto", | ||||||
doc = """\ | ||||||
Option values: | ||||||
* `disabled` - disable downloader. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit: mention that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1, except instead of pip, be vague and say some external tool |
||||||
* `debug` - stop using parallel downloading for debugging cases. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will this also increase logging levels? Or is that separate? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am thinking that we can get rid of parallel downloading. Could not think of a better name for this. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think Having a dedicated |
||||||
* `enabled` - enable downloader for all hubs. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
pip.parse(hub_name="foo", downloader="enabled", ...)
pip.parse(hub_name="bar", downloader="disabled", ...) Oh this is the global config as part of So this would be used via: pip = use_extension(@rules_python//python/extensions:pip.bzl, "pip")
pip.default(downloader = "enabled", index_overrides = {...})
pip.parse(...) Correct? Would the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am wondering about the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree that non-root module overrides is probably a Bad Idea ™️. I'm more asking about the pip.parse overrides: pip.default(
downloader = "enabled",
index_url = AIRLOCK_INDEX_URL,
index_overrides = {"foo": INDEX_A, "bar": INDEX_A},
)
pip.parse(
# this parse pulls from 2 indexes: AIRLOCK_INDEX_URL and INDEX_A
hub_name = "pypi",
python_version = PYTHON_CURRENT_VERSION,
requirements_lock = "//:requirements.txt",
)
pip.parse(
# this parse pulls from 3 indexes: AIRLOCK_INDEX_URL for most things,
# INDEX_A for 'bar' package, and INDEX_B for 'foo' package
experimental_index_url_overrides = {"foo": INDEX_B}, # same package, different index
hub_name = "pypi",
python_version = PYTHON_NEXT_VERSION,
requirements_lock = "//:requirements_pynext.txt",
) Like elsewhere, this is mostly hypothetical - I personally haven't had a need to use different indexes for different (python versions | hubs | requirement lock files). |
||||||
* `auto` - use the downloader if `experimental_index_url` is passed. | ||||||
|
||||||
The default index URL is defined by the {attr}`index_url` with overrides taken from | ||||||
`index_overrides`. If the package is not found in the index, we will search all of the indexes | ||||||
listed in the lock files used to create a particular hub repository. | ||||||
|
||||||
TODO: experimental and docs in flux. | ||||||
|
||||||
:::{versionadded} VERSION_NEXT_FEATURE | ||||||
::: | ||||||
""", | ||||||
), | ||||||
"index_overrides": attr.string_dict( | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Have you thought about cases where packages of a particular version must be downloaded from index A, while others versions must be downloaded from index B?
Or where windows versions are on a different index from linux versions? TBH I'm not sure how useful such a thing would be, but I could see edge cases requiring it such as when a private index isn't yet updated with the latest version of a package but users must absolutely have the latest version. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you explain the use case? Why would an index have version 1 but not version 2? Or the distributions for windows, but not linux? The case I can think of is e.g. cuda. Torch has different index urls for different cuda etc builds, but the env marker language can't express that anyways. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A real-life example is airlock/, which has a 2 week soak time for new releases: someone deploys Version 2 to PyPI but it will not show up in Airlock for two weeks - only Version 1 is available until the new version gets vetted and whatnot. A made-up example is that a package was migrated from INDEX_A to INDEX_B and because $REASON, historical artifacts could not be ported. I highly doubt such a case would occur. I can't think of any logical reason why an index would only have artifacts of a given OS. Admittedly this is mostly just theorycrafting. Even in the Airlock case, we just tell people to wait the 2 weeks... |
||||||
doc = """\ | ||||||
The index URL overrides for each package to use for downloading wheels using | ||||||
bazel downloader. This value is going to be subject to `envsubst` substitutions | ||||||
if necessary. | ||||||
|
||||||
The key is the package name (will be normalized before usage) and the value is the | ||||||
dougthor42 marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
index URL. | ||||||
|
||||||
This design pattern has been chosen in order to be fully deterministic about which | ||||||
packages come from which source. We want to avoid issues similar to what happened in | ||||||
https://pytorch.org/blog/compromised-nightly-dependency/. | ||||||
|
||||||
The indexes must support Simple API as described here: | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are url-based requirements specifiers in scope? For example index_overrides = {
"pip": "https://github.com/pypa/pip/archive/22.0.2.zip"
} Note: IMO this should not be valid, I'm just trying to get clarification. |
||||||
<https://packaging.python.org/en/latest/specifications/simple-repository-api/> | ||||||
|
||||||
If `skip` is used as a value, then we will not use the downloader for a particular package. The | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is interesting to me. Can you elaborate on what problem it's trying to solve? I think I like the idea, but I'm concerned that it'll cause hard-to-debug issues (such as having correct auth for Bazel downloader but not for the non-Bazel downloader case, or vice-versa). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit:
Suggested change
|
||||||
values by `root` module take precedence over all others and non-root module precedence is | ||||||
undefined. | ||||||
|
||||||
TODO: experimental | ||||||
TODO: implement skip | ||||||
|
||||||
:::{versionadded} VERSION_NEXT_FEATURE | ||||||
::: | ||||||
""", | ||||||
), | ||||||
"index_url": attr.string( | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you plan on dropping There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My thinking was that extra index urls can come from pip args and requirements files. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK, so extending that premise: why not also pull There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The reason is that we have to have a default somewhere if the index is not specified. If it is specified in the lock file, then we can either:
At least this is for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I fear that this would risk package name collisions and would be difficult to figure out the index ordering.
This sounds reasonable to me, but then a root module will not be able to override a child module's config (which I believe is a desired feature) without adjusting the child's
If we ignore
If I set
It's optional 😞 I think it's reasonable to put all index config in Bazel and completely ignore any index config set in requirements.txt or PEP751 |
||||||
doc = """\ | ||||||
The index URL to use for downloading wheels using bazel downloader. This value is going | ||||||
to be subject to `envsubst` substitutions if necessary. | ||||||
|
||||||
The indexes must support Simple API as described here: | ||||||
<https://packaging.python.org/en/latest/specifications/simple-repository-api/>. | ||||||
|
||||||
Note, this is used for *all* repositories by this feature, so that root modules can override the | ||||||
value to use a private mirror if necessary. | ||||||
|
||||||
:::{versionadded} VERSION_NEXT_FEATURE | ||||||
::: | ||||||
""", | ||||||
default = "${PYPI_INDEX_URL:-https://pypi.org/simple}", | ||||||
), | ||||||
} | ||||||
|
||||||
_SUPPORTED_PEP508_KEYS = [ | ||||||
"implementation_name", | ||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say
bazel_downloader
is better. As is, it kinda sounds like it would modify downloading altogether (similar topip.parse.download_only
).