Skip to content
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
74 changes: 70 additions & 4 deletions python/private/pypi/extension.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -115,9 +115,10 @@ def build_config(
# extra values that we just add
auth_patterns = tag.auth_patterns,
netrc = tag.netrc,
# TODO @aignas 2025-05-19: add more attr groups:
# * for index/downloader config. This includes all of those attributes for
# overrides, etc. Index overrides per platform could be also used here.
# TODO @aignas 2025-10-05: is this enough?
# downloader = tag.downloader,
# index_url = tag.index_url,
# index_overrides = tag.index_overrides,
)

return struct(
Expand Down Expand Up @@ -525,7 +526,72 @@ supported from this version without extra handling from the user.
:::
""",
),
} | AUTH_ATTRS
} | AUTH_ATTRS | {
# Downloader options
"downloader": attr.string(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say bazel_downloader is better. As is, it kinda sounds like it would modify downloading altogether (similar to pip.parse.download_only).

values = ["disabled", "debug", "enabled", "auto"],
default = "auto",
doc = """\
Option values:
* `disabled` - disable downloader.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: mention that pip will be used to download things instead.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, except instead of pip, be vague and say some external tool

* `debug` - stop using parallel downloading for debugging cases.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this also increase logging levels? Or is that separate?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking that we can get rid of parallel downloading. Could not think of a better name for this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think debug is fine for such a case.

Having a dedicated serial, download_serially, sequential, etc. value might be too restrictive.

* `enabled` - enable downloader for all hubs.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enabled would only apply to the hub that it's listed on, right?

pip.parse(hub_name="foo", downloader="enabled", ...)
pip.parse(hub_name="bar", downloader="disabled", ...)

Oh this is the global config as part of _default_attrs and the pypi module_extension, not the .parse tag class. I didn't notice that at first.

So this would be used via:

pip = use_extension(@rules_python//python/extensions:pip.bzl, "pip")
pip.default(downloader = "enabled", index_overrides = {...})
pip.parse(...)

Correct?

Would the pip.parse still support extending/overriding index_overrides and indexes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering about the index_overrides - the root module should be able to override anything here. However, what happens if a non-root module is defining this override. To do this blindly is a little sweat-inducing because of all of the recent supply chain attacks the public registries faced. I would say that we should keep index_overrides as a root-module-only option. WDYT?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that non-root module overrides is probably a Bad Idea ™️. I'm more asking about the pip.parse overrides:

pip.default(
    downloader = "enabled",
    index_url = AIRLOCK_INDEX_URL,
    index_overrides = {"foo": INDEX_A, "bar": INDEX_A},
)
pip.parse(
    # this parse pulls from 2 indexes: AIRLOCK_INDEX_URL and INDEX_A
    hub_name = "pypi",
    python_version = PYTHON_CURRENT_VERSION,
    requirements_lock = "//:requirements.txt",
)
pip.parse(
    # this parse pulls from 3 indexes: AIRLOCK_INDEX_URL for most things,
    # INDEX_A for 'bar' package, and INDEX_B for 'foo' package
    experimental_index_url_overrides = {"foo": INDEX_B},  # same package, different index
    hub_name = "pypi",
    python_version = PYTHON_NEXT_VERSION,
    requirements_lock = "//:requirements_pynext.txt",
)

Like elsewhere, this is mostly hypothetical - I personally haven't had a need to use different indexes for different (python versions | hubs | requirement lock files).

* `auto` - use the downloader if `experimental_index_url` is passed.

The default index URL is defined by the {attr}`index_url` with overrides taken from
`index_overrides`. If the package is not found in the index, we will search all of the indexes
listed in the lock files used to create a particular hub repository.

TODO: experimental and docs in flux.

:::{versionadded} VERSION_NEXT_FEATURE
:::
""",
),
"index_overrides": attr.string_dict(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you thought about cases where packages of a particular version must be downloaded from index A, while others versions must be downloaded from index B?

index_overrides = {
    "friendly-bard<=1.0.0,>3.0": INDEX_A,
    "friendly-bard": INDEX_B,
}

Or where windows versions are on a different index from linux versions?

TBH I'm not sure how useful such a thing would be, but I could see edge cases requiring it such as when a private index isn't yet updated with the latest version of a package but users must absolutely have the latest version.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain the use case? Why would an index have version 1 but not version 2? Or the distributions for windows, but not linux?

The case I can think of is e.g. cuda. Torch has different index urls for different cuda etc builds, but the env marker language can't express that anyways.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A real-life example is airlock/, which has a 2 week soak time for new releases: someone deploys Version 2 to PyPI but it will not show up in Airlock for two weeks - only Version 1 is available until the new version gets vetted and whatnot.

A made-up example is that a package was migrated from INDEX_A to INDEX_B and because $REASON, historical artifacts could not be ported. I highly doubt such a case would occur.

I can't think of any logical reason why an index would only have artifacts of a given OS.

Admittedly this is mostly just theorycrafting. Even in the Airlock case, we just tell people to wait the 2 weeks...

doc = """\
The index URL overrides for each package to use for downloading wheels using
bazel downloader. This value is going to be subject to `envsubst` substitutions
if necessary.

The key is the package name (will be normalized before usage) and the value is the
index URL.

This design pattern has been chosen in order to be fully deterministic about which
packages come from which source. We want to avoid issues similar to what happened in
https://pytorch.org/blog/compromised-nightly-dependency/.

The indexes must support Simple API as described here:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are url-based requirements specifiers in scope? For example pip @ https://github.com/pypa/pip/archive/22.0.2.zip is equivalent to

index_overrides = {
    "pip": "https://github.com/pypa/pip/archive/22.0.2.zip"
}

Note: IMO this should not be valid, I'm just trying to get clarification.

<https://packaging.python.org/en/latest/specifications/simple-repository-api/>

If `skip` is used as a value, then we will not use the downloader for a particular package. The
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting to me. Can you elaborate on what problem it's trying to solve?

I think I like the idea, but I'm concerned that it'll cause hard-to-debug issues (such as having correct auth for Bazel downloader but not for the non-Bazel downloader case, or vice-versa).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
If `skip` is used as a value, then we will not use the downloader for a particular package. The
If `skip` is used as a value, then we will not use the Bazel downloader for a particular package. The

values by `root` module take precedence over all others and non-root module precedence is
undefined.

TODO: experimental
TODO: implement skip

:::{versionadded} VERSION_NEXT_FEATURE
:::
""",
),
"index_url": attr.string(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you plan on dropping extra_index_urls?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking was that extra index urls can come from pip args and requirements files.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so extending that premise: why not also pull index_url from the requirements.txt files too?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason is that we have to have a default somewhere if the index is not specified. If it is specified in the lock file, then we can either:

  • Use them as extra index_urls.
  • Use them and ignore the default that we set.
  • Use PyPI as a fallback which gets used if else fails.

At least this is for requirements.txt, for PEP751 lock files the index is there always (I think).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use [index_url specified in the lock file] as extra index_urls

I fear that this would risk package name collisions and would be difficult to figure out the index ordering.

Use them and ignore the default that we set.

This sounds reasonable to me, but then a root module will not be able to override a child module's config (which I believe is a desired feature) without adjusting the child's requirements.txt directly:

./MODULE.bazel        # sets pip.default(downloader = "enabled", index_url = BAR)
./requirements.txt    # sets --index-url FOO
./child_module/       
    MODULE.bazel      # sets pip.default(downloader = "enabled", index_url = AAA)
    requirements.txt  # sets --index-url BAD

If we ignore pip.default, the root MODULE.bazel will pull from FOO and the child will pull from BAD. Right?

Use PyPI as a fallback which gets used if else fails.

If I set index_url in any of my requirements.txt files, I do not want pypi used at all.

$ head requirements.txt 
# This file was autogenerated by uv via the following command:
#    uv pip compile --python-version=3.12 --output-file=requirements.txt requirements.in
--index-url https://[email protected]/REDACTED/simple
--extra-index-url https://[email protected]/REDACTED/simple

absl-py==2.3.0 \

for PEP751 lock files the index is there always (I think).

It's optional 😞


I think it's reasonable to put all index config in Bazel and completely ignore any index config set in requirements.txt or PEP751 pylock.toml files.

doc = """\
The index URL to use for downloading wheels using bazel downloader. This value is going
to be subject to `envsubst` substitutions if necessary.

The indexes must support Simple API as described here:
<https://packaging.python.org/en/latest/specifications/simple-repository-api/>.

Note, this is used for *all* repositories by this feature, so that root modules can override the
value to use a private mirror if necessary.

:::{versionadded} VERSION_NEXT_FEATURE
:::
""",
default = "${PYPI_INDEX_URL:-https://pypi.org/simple}",
),
}

_SUPPORTED_PEP508_KEYS = [
"implementation_name",
Expand Down