feat(pip): global downloader settings #3322

aignas · 2025-10-05T03:40:42Z

Initial API proposal for a stable API for the experimental_index_url.

TODO for this PR:

Agree on the interface and flow - it should be described either as a
sketch in docs in how to or in the attributes.
Implement
Deprecation messages for the experimental_*index* flags

WIP

#2951

aignas · 2025-10-05T03:41:20Z

@dougthor42, could you please comment on how you imagine the API work for the downloader configuration?

dougthor42

Overall I'm a little concerned with forcing all hubs to use the same index_url and index_overrides settings, as there are use cases where one hub (say a python 3.12 hub) needs to pull from a different index than another hub (eg py3.13).

But maybe I'm misunderstanding the proposal and it's still possible to set things on a per-hub basis.

Another thing I'd like to see - which admittedly is probably out of scope - is regex-based index_overrides. For example, we have many internal packages prefixed with qh- or pyle-, and having to list all those out in index_overrides is a little cumbersome.

All that said, I think that the core API of (bazel_)downloader, index_url, and index_overrides is pretty reasonable.

python/private/pypi/extension.bzl

dougthor42 · 2025-10-05T21:09:17Z

python/private/pypi/extension.bzl

+The indexes must support Simple API as described here:
+<https://packaging.python.org/en/latest/specifications/simple-repository-api/>
+
+If `skip` is used as a value, then we will not use the downloader for a particular package. The


This is interesting to me. Can you elaborate on what problem it's trying to solve?

I think I like the idea, but I'm concerned that it'll cause hard-to-debug issues (such as having correct auth for Bazel downloader but not for the non-Bazel downloader case, or vice-versa).

dougthor42 · 2025-10-05T21:09:56Z

python/private/pypi/extension.bzl

+The indexes must support Simple API as described here:
+<https://packaging.python.org/en/latest/specifications/simple-repository-api/>
+
+If `skip` is used as a value, then we will not use the downloader for a particular package. The


Nit:

Suggested change

If `skip` is used as a value, then we will not use the downloader for a particular package. The

If `skip` is used as a value, then we will not use the Bazel downloader for a particular package. The

dougthor42 · 2025-10-05T21:13:36Z

python/private/pypi/extension.bzl

+:::
+""",
+    ),
+    "index_overrides": attr.string_dict(


Have you thought about cases where packages of a particular version must be downloaded from index A, while others versions must be downloaded from index B?

index_overrides = { "friendly-bard<=1.0.0,>3.0": INDEX_A, "friendly-bard": INDEX_B, }

Or where windows versions are on a different index from linux versions?

TBH I'm not sure how useful such a thing would be, but I could see edge cases requiring it such as when a private index isn't yet updated with the latest version of a package but users must absolutely have the latest version.

Can you explain the use case? Why would an index have version 1 but not version 2? Or the distributions for windows, but not linux?

The case I can think of is e.g. cuda. Torch has different index urls for different cuda etc builds, but the env marker language can't express that anyways.

A real-life example is airlock/, which has a 2 week soak time for new releases: someone deploys Version 2 to PyPI but it will not show up in Airlock for two weeks - only Version 1 is available until the new version gets vetted and whatnot.

A made-up example is that a package was migrated from INDEX_A to INDEX_B and because $REASON, historical artifacts could not be ported. I highly doubt such a case would occur.

I can't think of any logical reason why an index would only have artifacts of a given OS.

Admittedly this is mostly just theorycrafting. Even in the Airlock case, we just tell people to wait the 2 weeks...

dougthor42 · 2025-10-05T21:14:56Z

python/private/pypi/extension.bzl

+:::
+""",
+    ),
+    "index_url": attr.string(


Do you plan on dropping extra_index_urls?

My thinking was that extra index urls can come from pip args and requirements files.

OK, so extending that premise: why not also pull index_url from the requirements.txt files too?

The reason is that we have to have a default somewhere if the index is not specified. If it is specified in the lock file, then we can either:

Use them as extra index_urls.

Use them and ignore the default that we set.

Use PyPI as a fallback which gets used if else fails.

At least this is for requirements.txt, for PEP751 lock files the index is there always (I think).

use [index_url specified in the lock file] as extra index_urls

I fear that this would risk package name collisions and would be difficult to figure out the index ordering.

Use them and ignore the default that we set.

This sounds reasonable to me, but then a root module will not be able to override a child module's config (which I believe is a desired feature) without adjusting the child's requirements.txt directly:

./MODULE.bazel # sets pip.default(downloader = "enabled", index_url = BAR) ./requirements.txt # sets --index-url FOO ./child_module/ MODULE.bazel # sets pip.default(downloader = "enabled", index_url = AAA) requirements.txt # sets --index-url BAD

If we ignore pip.default, the root MODULE.bazel will pull from FOO and the child will pull from BAD. Right?

Use PyPI as a fallback which gets used if else fails.

If I set index_url in any of my requirements.txt files, I do not want pypi used at all.

$ head requirements.txt # This file was autogenerated by uv via the following command: # uv pip compile --python-version=3.12 --output-file=requirements.txt requirements.in --index-url https://[email protected]/REDACTED/simple --extra-index-url https://[email protected]/REDACTED/simple absl-py==2.3.0 \

for PEP751 lock files the index is there always (I think).

It's optional 😞

I think it's reasonable to put all index config in Bazel and completely ignore any index config set in requirements.txt or PEP751 pylock.toml files.

dougthor42 · 2025-10-05T21:32:19Z

python/private/pypi/extension.bzl

+Option values:
+* `disabled` - disable downloader.
+* `debug` - stop using parallel downloading for debugging cases.
+* `enabled` - enable downloader for all hubs.


~~enabled would only apply to the hub that it's listed on, right?~~

pip.parse(hub_name="foo", downloader="enabled", ...) pip.parse(hub_name="bar", downloader="disabled", ...)

Oh this is the global config as part of _default_attrs and the pypi module_extension, not the .parse tag class. I didn't notice that at first.

So this would be used via:

pip = use_extension(@rules_python//python/extensions:pip.bzl, "pip") pip.default(downloader = "enabled", index_overrides = {...}) pip.parse(...)

Correct?

Would the pip.parse still support extending/overriding index_overrides and indexes?

I am wondering about the index_overrides - the root module should be able to override anything here. However, what happens if a non-root module is defining this override. To do this blindly is a little sweat-inducing because of all of the recent supply chain attacks the public registries faced. I would say that we should keep index_overrides as a root-module-only option. WDYT?

I agree that non-root module overrides is probably a Bad Idea ™️. I'm more asking about the pip.parse overrides:

pip.default( downloader = "enabled", index_url = AIRLOCK_INDEX_URL, index_overrides = {"foo": INDEX_A, "bar": INDEX_A}, ) pip.parse( # this parse pulls from 2 indexes: AIRLOCK_INDEX_URL and INDEX_A hub_name = "pypi", python_version = PYTHON_CURRENT_VERSION, requirements_lock = "//:requirements.txt", ) pip.parse( # this parse pulls from 3 indexes: AIRLOCK_INDEX_URL for most things, # INDEX_A for 'bar' package, and INDEX_B for 'foo' package experimental_index_url_overrides = {"foo": INDEX_B}, # same package, different index hub_name = "pypi", python_version = PYTHON_NEXT_VERSION, requirements_lock = "//:requirements_pynext.txt", )

Like elsewhere, this is mostly hypothetical - I personally haven't had a need to use different indexes for different (python versions | hubs | requirement lock files).

non-root index overrides

I agree it's a bad idea to let non-root add indexes. It only makes sense for the isolated=True case, where you want a fully private dependency closure. But I think we should worry about that separately.

this is mostly hypothetical

An easy non-hypothetical case is pytorch. They have 5 different indexes for cuda, rocm, cpu, etc builds of pytorch. A requirements file just says torch == whatever. To make this work on the bazel side, you have to, somehow, specify that, for requirements_cpu.txt, needs torch to come from the pytorch cpu index. Meanwhile, requirements_cuda.txt needs torch to come from the pytorch cuda index.

(Torch indexes are partial, so you can't use them as the sole index. But that's just how they've decided to do their index).

dougthor42 · 2025-10-05T21:39:47Z

python/private/pypi/extension.bzl

+        doc = """\
+Option values:
+* `disabled` - disable downloader.
+* `debug` - stop using parallel downloading for debugging cases.


Will this also increase logging levels? Or is that separate?

I am thinking that we can get rid of parallel downloading. Could not think of a better name for this.

I think debug is fine for such a case.

Having a dedicated serial, download_serially, sequential, etc. value might be too restrictive.

dougthor42 · 2025-10-05T21:46:47Z

python/private/pypi/extension.bzl

-} | AUTH_ATTRS
+} | AUTH_ATTRS | {
+    # Downloader options
+    "downloader": attr.string(


I would say bazel_downloader is better. As is, it kinda sounds like it would modify downloading altogether (similar to pip.parse.download_only).

dougthor42 · 2025-10-05T21:48:30Z

python/private/pypi/extension.bzl

+        default = "auto",
+        doc = """\
+Option values:
+* `disabled` - disable downloader.


Nit: mention that pip will be used to download things instead.

+1, except instead of pip, be vague and say some external tool

dougthor42 · 2025-10-06T03:23:15Z

python/private/pypi/extension.bzl

+:::
+""",
+    ),
+    "index_overrides": attr.string_dict(


A real-life example is airlock/, which has a 2 week soak time for new releases: someone deploys Version 2 to PyPI but it will not show up in Airlock for two weeks - only Version 1 is available until the new version gets vetted and whatnot.

A made-up example is that a package was migrated from INDEX_A to INDEX_B and because $REASON, historical artifacts could not be ported. I highly doubt such a case would occur.

I can't think of any logical reason why an index would only have artifacts of a given OS.

Admittedly this is mostly just theorycrafting. Even in the Airlock case, we just tell people to wait the 2 weeks...

dougthor42 · 2025-10-06T03:25:39Z

python/private/pypi/extension.bzl

+:::
+""",
+    ),
+    "index_url": attr.string(


OK, so extending that premise: why not also pull index_url from the requirements.txt files too?

dougthor42 · 2025-10-06T03:28:50Z

python/private/pypi/extension.bzl

+        doc = """\
+Option values:
+* `disabled` - disable downloader.
+* `debug` - stop using parallel downloading for debugging cases.


I think debug is fine for such a case.

Having a dedicated serial, download_serially, sequential, etc. value might be too restrictive.

dougthor42 · 2025-10-06T03:37:29Z

python/private/pypi/extension.bzl

+packages come from which source. We want to avoid issues similar to what happened in
+https://pytorch.org/blog/compromised-nightly-dependency/.
+
+The indexes must support Simple API as described here:


Are url-based requirements specifiers in scope? For example pip @ https://github.com/pypa/pip/archive/22.0.2.zip is equivalent to

index_overrides = { "pip": "https://github.com/pypa/pip/archive/22.0.2.zip" }

Note: IMO this should not be valid, I'm just trying to get clarification.

rickeylev · 2025-10-09T20:26:50Z

airlock has soak time

My day team has run into something similar.

A release candidate will be created, but its hard to consume unless its also uploaded to pypi. Similarly, various projects are wired up to use nightly wheels of important dependencies. Right now, this works by using file:// in requirements files and some repo-rule pre-processing.

package migrated from index a to index b

iirc, my day team had to change the GCS buckets wheels were being uploaded to. There were a lot of old wheels, and it was generally a pain, so they only copied over recent releases and left the old ones sitting there. The net effect is a partial index based on version. I think this is really niche, though, so don't think its worth trying to add foo >1.0 support in the override logic.

feat(pip): global downloader settings

d4d2453

fixup the configure hooks

6c73c47

dougthor42 reviewed Oct 5, 2025

View reviewed changes

dougthor42 reviewed Oct 6, 2025

View reviewed changes

	If `skip` is used as a value, then we will not use the downloader for a particular package. The
	If `skip` is used as a value, then we will not use the Bazel downloader for a particular package. The

Uh oh!

feat(pip): global downloader settings #3322

Are you sure you want to change the base?

feat(pip): global downloader settings #3322

Uh oh!

Conversation

aignas commented Oct 5, 2025

Uh oh!

aignas commented Oct 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dougthor42 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rickeylev commented Oct 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

aignas commented Oct 5, 2025 •

edited

Loading