-
-
Notifications
You must be signed in to change notification settings - Fork 650
fix: handle unsubstituted template placeholders for external native py_binary #3495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix: handle unsubstituted template placeholders for external native py_binary #3495
Conversation
…y_binary Problem: In rules_python 1.7.0, PR bazel-contrib#3334 ("build: Starlarkify python flags") introduced new template placeholders in the bootstrap scripts: %stage2_bootstrap% and %interpreter_args%. These placeholders are expected to be substituted by rules_python's Starlark code when building py_binary targets. However, when external repositories (like gRPC's cython) define py_binary using the native rule, these placeholders are NOT substituted. In Bazel 7+, native py_binary is implemented by rules_python, but native py_binary doesn't expose attributes like `interpreter_args` that the substitution logic expects. The result is that literal placeholder text ends up in the generated bootstrap scripts, causing Python SyntaxError or file-not-found errors at runtime: File ".../cython_binary", line 39 %interpreter_args% ^ SyntaxError: invalid syntax Fix: Use a sentinel detection pattern directly in the templates. The key insight is that by splitting the sentinel string (e.g., "%stage2" + "_bootstrap%"), the substitution logic won't replace it since it looks for the exact contiguous string. At runtime, the concatenation produces the original placeholder text, which we compare against to detect if substitution occurred. For %stage2_bootstrap%, we fall back to %main% which IS substituted even for native py_binary. For %interpreter_args%, we wrap it in triple-quotes so it's always valid Python syntax, then detect the sentinel and default to an empty list. This is a template-side fix that is backwards compatible and doesn't require changes to Bazel or the substitution logic. Test: Add an integration test that creates an external repository with a native py_binary (exactly like gRPC and other external repos do) and verifies it can be built and executed successfully in WORKSPACE mode.
- Add STAGE2_BOOTSTRAP validation to Python template (consistency with shell) - Add comments explaining sentinel pattern in both templates - Restore diagnostic output in test script
Summary of ChangesHello @thomasdesr, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves a critical issue where Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request addresses an issue with unsubstituted template placeholders for native py_binary targets in external repositories by introducing sentinel-based checks to fall back gracefully. The overall approach is sound and the added integration test is a valuable addition for ensuring this case is covered. The implementation in the Python template (python_bootstrap_template.txt) is correct. However, I've found a couple of issues in the shell script template (stage1_bootstrap_template.sh) related to how sentinels are defined and checked, which could lead to incorrect behavior. My review includes detailed comments and suggestions to fix these bugs.
| STAGE2_BOOTSTRAP_SENTINEL="%stage2""_bootstrap%" | ||
| MAIN_SENTINEL="%main""%" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sentinel definitions for STAGE2_BOOTSTRAP and MAIN are quoted. When the placeholders are not substituted, the variables are assigned the unquoted placeholder string (e.g., STAGE2_BOOTSTRAP becomes %stage2_bootstrap%). This causes the comparison with the quoted sentinel (e.g., "%stage2_bootstrap%") to fail, as [[ "%stage2_bootstrap%" == "\"%stage2_bootstrap%\"" ]] evaluates to false. This breaks the intended fallback logic. To fix this, the sentinels should be defined without quotes.
| STAGE2_BOOTSTRAP_SENTINEL="%stage2""_bootstrap%" | |
| MAIN_SENTINEL="%main""%" | |
| STAGE2_BOOTSTRAP_SENTINEL=%stage2""_bootstrap% | |
| MAIN_SENTINEL=%main""% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Gemini is just wrong here?
$ echo "%stage2""_bootstrap%"
%stage2_bootstrap%| INTERPRETER_ARGS_SENTINEL="%interpreter""_args%" | ||
| if [[ "${#INTERPRETER_ARGS_FROM_TARGET[@]}" -eq 1 && | ||
| "${INTERPRETER_ARGS_FROM_TARGET[0]}" == "$INTERPRETER_ARGS_SENTINEL" ]]; then | ||
| INTERPRETER_ARGS_FROM_TARGET=() | ||
| fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic to detect an unsubstituted %interpreter_args% placeholder has an edge-case bug. If a user provides a single interpreter argument that is the exact string "%interpreter_args%", it will be incorrectly identified as an unsubstituted placeholder, and the argument will be dropped. This happens because declare -a arr=(foo) and declare -a arr=("foo") are treated identically by the shell for simple strings without special characters.
A more robust approach is to use a unique sentinel for the case of empty arguments, which requires a small change in python/private/py_executable.bzl. This makes a collision with a user-provided argument much less likely.
Here is the recommended change for python/private/py_executable.bzl (in the _create_stage1_bootstrap function):
# python/private/py_executable.bzl
_INTERPRETER_ARGS_SENTINEL_EMPTY = "__py_interpreter_args_empty_sentinel__"
subs = {
"%interpreter_args%": "\n".join([
'"{}"'.format(v)
for v in ctx.attr.interpreter_args
]) if ctx.attr.interpreter_args else '"{}"'.format(_INTERPRETER_ARGS_SENTINEL_EMPTY),
# ... other substitutions
}With that change, the logic in this file can be updated to be more robust. Note that the sentinel for the unsubstituted case also needs to be unquoted to work correctly.
| INTERPRETER_ARGS_SENTINEL="%interpreter""_args%" | |
| if [[ "${#INTERPRETER_ARGS_FROM_TARGET[@]}" -eq 1 && | |
| "${INTERPRETER_ARGS_FROM_TARGET[0]}" == "$INTERPRETER_ARGS_SENTINEL" ]]; then | |
| INTERPRETER_ARGS_FROM_TARGET=() | |
| fi | |
| INTERPRETER_ARGS_SENTINEL_UNSUBSTITUTED=%interpreter""_args% | |
| INTERPRETER_ARGS_SENTINEL_EMPTY="__py_interpreter_args_empty_sentinel__" | |
| if [[ "${#INTERPRETER_ARGS_FROM_TARGET[@]}" -eq 1 ]] && \ | |
| ([[ "${INTERPRETER_ARGS_FROM_TARGET[0]}" == "$INTERPRETER_ARGS_SENTINEL_UNSUBSTITUTED" ]] || \ | |
| [[ "${INTERPRETER_ARGS_FROM_TARGET[0]}" == "$INTERPRETER_ARGS_SENTINEL_EMPTY" ]]); then | |
| INTERPRETER_ARGS_FROM_TARGET=() | |
| fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also seems off, if a user is intentionally passing the exact sentinel string that would be unfortunate, but probably not worth complicating the fix for?
Defer to maintainers though
The sh_test requires @rules_shell which isn't provided transitively. Buildifier auto-adds the load statement for rules_shell's sh_test, so we need the dependency in the standalone WORKSPACE.
Bazel 9 removed native py_binary from the default namespace when using bzlmod, making this test incompatible. Restrict the test to Bazel 7.4.1, 8.0.0, and self (current Bazel) where native rules are still available.
As far as I know there is no such thing and the At some point bazel will stop doing this and it will be impossible to call Could you please explain the setup of gRPC a little more? How are you using EDIT: sorry for the close and reopen - sausage fingers. |
Or I guess a real life minimal reproducer? |
This demonstrates the regression in rules_python 1.7.0 where
%interpreter_args% template placeholders are not substituted for
external py_binary targets.
Building the same target now fails:
bazel build @com_github_grpc_grpc//src/python/grpcio/grpc/_cython:cygrpc.pyx_cython_translation
Error:
File ".../cython_binary", line 39
%interpreter_args%
^
SyntaxError: invalid syntax
Root cause: PR #3242 introduced new template variables that native
py_binary doesn't substitute.
Fix: bazel-contrib/rules_python#3495
This demonstrates the regression in rules_python 1.7.0 where
%interpreter_args% template placeholders are not substituted for
external py_binary targets.
Building the same target now fails:
bazel build @com_github_grpc_grpc//src/python/grpcio/grpc/_cython:cygrpc.pyx_cython_translation
Error:
File ".../cython_binary", line 39
%interpreter_args%
^
SyntaxError: invalid syntax
Root cause: PR #3242 introduced new template variables that native
py_binary doesn't substitute.
Fix: bazel-contrib/rules_python#3495
|
Sorry if I'm not using the right words to describe it because I agree it is going through the I extracted a pretty minimal repro from the codebase where I saw this getting triggered: https://github.com/thomasdesr/rules_python_3495_repro If you checkout that repo and try to build anything that depends on gRPC's cython compilation (e.g. |
Problem
In #3242, it looks like
rules_pythonintroduced new template placeholders in the bootstrap scripts:%stage2_bootstrap%and%interpreter_args%. And from what I can tell, also made their successful substitution a requirement.This works totally fine when the caller is calling
rules_pythondirectly. However, when external repositories (like gRPC's cython) define py_binary using the native rule, these placeholders don't seem to be substituted? The result is that the literal placeholder text ends up in the generated bootstrap scripts, causingSyntaxErrors or file-not-found errors at runtimeFix
Best idea I've come up with to minimize the scope of fixes is slightly hacky; but by using a sentinel to detect that the substitution has failed directly from within the templates.
%stage2_bootstrap%, we fall back to%main%which IS substituted even for nativepy_binary.%interpreter_args%, we wrap it in triple-quotes so it's hopefully always valid Python syntax, then detect the sentinel and default to an empty list. This feels very ugly, and I would love advice about how to do this better within rules_python's set of upstreams.If anyone has a a better idea, I'd be very happy to scrap and implement that :D
Testing
I started by adding an integration test to reproduce the issue (external repository with a native py_binary, aka grpc) and verified it was passing after these fixes landed