-
-
Notifications
You must be signed in to change notification settings - Fork 631
feat/fix(pip): enable local paths for experimental_index_url #3312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat/fix(pip): enable local paths for experimental_index_url #3312
Conversation
Summary of ChangesHello @adrianimboden, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a crucial feature that allows the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds a useful feature to enable local paths for pip indexes, which is great for offline builds. However, the current implementation introduces a few critical issues and regressions. I've found a critical regression in simpleapi_download.bzl
that breaks fetching from multiple indexes. Additionally, the logic for handling local paths in both modified files has several bugs, including incorrect URL detection, lack of error handling, and a crash-inducing bug in whl_library.bzl
due to a missing sha256
field. I've provided detailed comments and suggestions to address these issues, including a significant refactoring for whl_library.bzl
to use a more robust approach with file://
URLs.
python/private/pypi/whl_library.bzl
Outdated
if len(urls) == 1 and not urls[0].startswith("https://"): | ||
local_url = urls[0] | ||
local_path = rctx.path(local_url) | ||
|
||
if local_path.is_dir: | ||
rctx.file(filename, rctx.read(local_url + "index.html")) | ||
result = struct( | ||
output = filename, | ||
success = True, | ||
) | ||
else: | ||
rctx.file(filename, rctx.read(local_url)) | ||
result = struct( | ||
output = filename, | ||
success = True, | ||
) | ||
else: | ||
result = rctx.download( | ||
url = urls, | ||
output = filename, | ||
sha256 = rctx.attr.sha256, | ||
auth = get_auth(rctx, urls), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This block for handling local paths has several issues, including incorrect logic for wheel URLs (the is_dir
check), a missing sha256
in the result which will cause a crash, a too-broad condition for identifying local paths, and lack of error handling. A much simpler and more robust approach is to convert the local path to a file://
URL and use rctx.download
, which correctly handles all these aspects.
download_urls = urls
if len(urls) == 1 and not (urls[0].startswith("https://") or urls[0].startswith("http://")):
# It's a local path, convert to a file:// URL for ctx.download
abs_path = rctx.path(urls[0]).realpath
download_urls = ["file://" + abs_path]
result = rctx.download(
url = download_urls,
output = filename,
sha256 = rctx.attr.sha256,
auth = get_auth(rctx, download_urls),
)
if not real_url.startswith("https://"): | ||
normalized_url = real_url | ||
local_path = ctx.path(normalized_url) | ||
|
||
def local_read_index_result(): | ||
if local_path.is_dir: | ||
ctx.file(output, ctx.read(normalized_url + "index.html")) | ||
return struct( | ||
output = output, | ||
success = True, | ||
) | ||
else: | ||
ctx.file(output, ctx.read(normalized_url)) | ||
return struct( | ||
output = output, | ||
success = True, | ||
) | ||
|
||
download = struct(wait = lambda: local_read_index_result()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic for handling local paths has a few issues:
- The condition
not real_url.startswith("https://")
is too broad. It will incorrectly treathttp://
URLs as local paths. - The
local_read_index_result
function lacks error handling. Ifctx.read
fails because a file doesn't exist, it will cause the entire rule to fail, which prevents the logic insimpleapi_download
from trying other index URLs. - The implementation of
local_read_index_result
can be simplified to avoid code duplication.
Here is a suggested implementation that addresses these points:
if not real_url.startswith("https://"): | |
normalized_url = real_url | |
local_path = ctx.path(normalized_url) | |
def local_read_index_result(): | |
if local_path.is_dir: | |
ctx.file(output, ctx.read(normalized_url + "index.html")) | |
return struct( | |
output = output, | |
success = True, | |
) | |
else: | |
ctx.file(output, ctx.read(normalized_url)) | |
return struct( | |
output = output, | |
success = True, | |
) | |
download = struct(wait = lambda: local_read_index_result()) | |
if not (real_url.startswith("https://") or real_url.startswith("http://")): | |
normalized_url = real_url | |
local_path = ctx.path(normalized_url) | |
def local_read_index_result(): | |
path_to_read = normalized_url | |
if local_path.is_dir: | |
path_to_read += "index.html" | |
if not ctx.path(path_to_read).exists: | |
return struct(success = False) | |
ctx.file(output, ctx.read(path_to_read)) | |
return struct( | |
output = output, | |
success = True, | |
) | |
download = struct(wait = lambda: local_read_index_result()) |
python/private/pypi/whl_library.bzl
Outdated
) | ||
if len(urls) == 1 and not urls[0].startswith("https://"): | ||
local_url = urls[0] | ||
local_path = rctx.path(local_url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, the whl_library
should consume locally available paths through labels, because then it makes things easier to thread through and whl_library
already accepts a label. Would it be possible to create a repository_rule
that exposes these paths as labels?
This looks interesting - we certainly have discussed about using local path for the
How do you update the local mirror? |
I populate the mirror like this:
The folder looks like this then:
I am not sure about the label stuff. Did you think about something like this?
I am not sure how easy that will be. Seems like a complete new codepath to me when there may be an URL or a label. Or do I miss something? |
104d221
to
4b93661
Compare
4b93661
to
f7cee05
Compare
after looking at it again, I saw that there is a much simpler solution to make it better in the meantime. I wrongly assumed that the download functions don't work for local files. I always had the problem that file:// urls did not work. I found out that this is because the urls get normalized first. A small addition to Making it work with labels would be nice tough. Probably for another time? |
It did not work before because strip_empty_path_segments does the following: |
I really like the idea of being able to point to a local path using a label, for several reasons. It'd be really convenient for testing our pip integration -- we can easily construct arbitrary index states and have a more end-to-end verification. It also seems like a really flexible and powerful way for customizing where pip is getting stuff from. You could write a repo rule to make the pip index look however you want, and be populated however you want. |
Yes, this sort of thing is great. It's often called a "wheel house" and is a very common and useful pattern for offline builds, avoiding sdist in deployment scenarios, etc. Very supportive of this. Tools in a similarish space are: https://github.com/chriskuehl/dumb-pypi |
Thinking out loud a little bit how this could be designed. This might be a train of thought but I'll just right it out as I think.
So to sum up, the files that would need to be touched:
There are probably ways to optimize this approach. |
Hi rules_python team
I use your rules for a long time now. With WORKSPACE style, I use pip as follows:
The folder
/home/user/local_pip_mirror
gets populated withpypi-mirror download --requirement requirements_lock.txt
So I have a nice and clean way to use offline build. I could never get it running directly with
bazel fetch
and stuff. But this solution was very nice because no internet was involved at all.I am in the way of upgrading to bzlmod. I saw many bug reports for making offline build work, but I honestly gave up with
bazel vendor
and stuff again. The simplest way in my opinion is to just use a local pip mirror. For that I tried the following:at the moment,
experimental_index_url
must be a https:// url to work.This MR changes it so that local paths are also a possible
experimental_index_url
.For my project, the proposed changes are in effect and working great.
It is not perfect, but I think it is an important addition to aid the bzlmod migration.