Skip to content

Conversation

@thomasdesr
Copy link

@thomasdesr thomasdesr commented Jan 6, 2026

Problem

In #3242, it looks like rules_python introduced new template placeholders in the bootstrap scripts: %stage2_bootstrap% and %interpreter_args%. And from what I can tell, also made their successful substitution a requirement.

This works totally fine when the caller is calling rules_python directly. However, when external repositories (like gRPC's cython) define py_binary using the native rule, these placeholders don't seem to be substituted? The result is that the literal placeholder text ends up in the generated bootstrap scripts, causing SyntaxErrors or file-not-found errors at runtime

Fix

Best idea I've come up with to minimize the scope of fixes is slightly hacky; but by using a sentinel to detect that the substitution has failed directly from within the templates.

  • For %stage2_bootstrap%, we fall back to %main% which IS substituted even for native py_binary.
  • For %interpreter_args%, we wrap it in triple-quotes so it's hopefully always valid Python syntax, then detect the sentinel and default to an empty list. This feels very ugly, and I would love advice about how to do this better within rules_python's set of upstreams.

If anyone has a a better idea, I'd be very happy to scrap and implement that :D

Testing

I started by adding an integration test to reproduce the issue (external repository with a native py_binary, aka grpc) and verified it was passing after these fixes landed

thomasdesr and others added 2 commits January 5, 2026 16:27
…y_binary

Problem:
In rules_python 1.7.0, PR bazel-contrib#3334 ("build: Starlarkify python flags") introduced
new template placeholders in the bootstrap scripts: %stage2_bootstrap% and
%interpreter_args%. These placeholders are expected to be substituted by
rules_python's Starlark code when building py_binary targets.

However, when external repositories (like gRPC's cython) define py_binary
using the native rule, these placeholders are NOT substituted. In Bazel 7+,
native py_binary is implemented by rules_python, but native py_binary doesn't
expose attributes like `interpreter_args` that the substitution logic expects.
The result is that literal placeholder text ends up in the generated bootstrap
scripts, causing Python SyntaxError or file-not-found errors at runtime:

    File ".../cython_binary", line 39
      %interpreter_args%
      ^
    SyntaxError: invalid syntax

Fix:
Use a sentinel detection pattern directly in the templates. The key insight is
that by splitting the sentinel string (e.g., "%stage2" + "_bootstrap%"), the
substitution logic won't replace it since it looks for the exact contiguous
string. At runtime, the concatenation produces the original placeholder text,
which we compare against to detect if substitution occurred.

For %stage2_bootstrap%, we fall back to %main% which IS substituted even for
native py_binary. For %interpreter_args%, we wrap it in triple-quotes so it's
always valid Python syntax, then detect the sentinel and default to an empty
list.

This is a template-side fix that is backwards compatible and doesn't require
changes to Bazel or the substitution logic.

Test:
Add an integration test that creates an external repository with a native
py_binary (exactly like gRPC and other external repos do) and verifies it
can be built and executed successfully in WORKSPACE mode.
- Add STAGE2_BOOTSTRAP validation to Python template (consistency with shell)
- Add comments explaining sentinel pattern in both templates
- Restore diagnostic output in test script
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @thomasdesr, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a critical issue where rules_python bootstrap scripts failed to correctly substitute new template placeholders (%stage2_bootstrap% and %interpreter_args%) when native py_binary rules were used within external Bazel repositories. The fix introduces a robust fallback mechanism within the bootstrap templates themselves, using sentinel values to detect failed substitutions and provide appropriate defaults or alternative values, thereby preventing runtime errors. A new integration test has been added to ensure the problem remains resolved.

Highlights

  • Problem Resolution: Addresses an issue where new bootstrap script template placeholders (%stage2_bootstrap% and %interpreter_args%) were not being substituted for native py_binary rules in external repositories, leading to runtime errors.
  • Sentinel-based Fallback: Implements a mechanism using sentinel strings within the bootstrap templates to detect when substitutions fail, allowing for graceful fallback behavior.
  • Stage 2 Bootstrap Handling: If %stage2_bootstrap% is not substituted, the system now falls back to using the %main% placeholder, which is reliably substituted by native Bazel rules. An error is raised if neither is substituted.
  • Interpreter Arguments Handling: Unsubstituted %interpreter_args% are now handled by wrapping them in triple-quotes for valid Python syntax and defaulting to an empty list if the sentinel is detected, preventing SyntaxErrors.
  • New Integration Test: A dedicated integration test (external_native_py_binary) has been added to reproduce the original issue and confirm the effectiveness of the fix.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses an issue with unsubstituted template placeholders for native py_binary targets in external repositories by introducing sentinel-based checks to fall back gracefully. The overall approach is sound and the added integration test is a valuable addition for ensuring this case is covered. The implementation in the Python template (python_bootstrap_template.txt) is correct. However, I've found a couple of issues in the shell script template (stage1_bootstrap_template.sh) related to how sentinels are defined and checked, which could lead to incorrect behavior. My review includes detailed comments and suggestions to fix these bugs.

Comment on lines +15 to +16
STAGE2_BOOTSTRAP_SENTINEL="%stage2""_bootstrap%"
MAIN_SENTINEL="%main""%"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The sentinel definitions for STAGE2_BOOTSTRAP and MAIN are quoted. When the placeholders are not substituted, the variables are assigned the unquoted placeholder string (e.g., STAGE2_BOOTSTRAP becomes %stage2_bootstrap%). This causes the comparison with the quoted sentinel (e.g., "%stage2_bootstrap%") to fail, as [[ "%stage2_bootstrap%" == "\"%stage2_bootstrap%\"" ]] evaluates to false. This breaks the intended fallback logic. To fix this, the sentinels should be defined without quotes.

Suggested change
STAGE2_BOOTSTRAP_SENTINEL="%stage2""_bootstrap%"
MAIN_SENTINEL="%main""%"
STAGE2_BOOTSTRAP_SENTINEL=%stage2""_bootstrap%
MAIN_SENTINEL=%main""%

Copy link
Author

@thomasdesr thomasdesr Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Gemini is just wrong here?

$ echo "%stage2""_bootstrap%"
%stage2_bootstrap%

Comment on lines +54 to +58
INTERPRETER_ARGS_SENTINEL="%interpreter""_args%"
if [[ "${#INTERPRETER_ARGS_FROM_TARGET[@]}" -eq 1 &&
"${INTERPRETER_ARGS_FROM_TARGET[0]}" == "$INTERPRETER_ARGS_SENTINEL" ]]; then
INTERPRETER_ARGS_FROM_TARGET=()
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The logic to detect an unsubstituted %interpreter_args% placeholder has an edge-case bug. If a user provides a single interpreter argument that is the exact string "%interpreter_args%", it will be incorrectly identified as an unsubstituted placeholder, and the argument will be dropped. This happens because declare -a arr=(foo) and declare -a arr=("foo") are treated identically by the shell for simple strings without special characters.

A more robust approach is to use a unique sentinel for the case of empty arguments, which requires a small change in python/private/py_executable.bzl. This makes a collision with a user-provided argument much less likely.

Here is the recommended change for python/private/py_executable.bzl (in the _create_stage1_bootstrap function):

# python/private/py_executable.bzl
    _INTERPRETER_ARGS_SENTINEL_EMPTY = "__py_interpreter_args_empty_sentinel__"
    subs = {
        "%interpreter_args%": "\n".join([
            '"{}"'.format(v)
            for v in ctx.attr.interpreter_args
        ]) if ctx.attr.interpreter_args else '"{}"'.format(_INTERPRETER_ARGS_SENTINEL_EMPTY),
        # ... other substitutions
    }

With that change, the logic in this file can be updated to be more robust. Note that the sentinel for the unsubstituted case also needs to be unquoted to work correctly.

Suggested change
INTERPRETER_ARGS_SENTINEL="%interpreter""_args%"
if [[ "${#INTERPRETER_ARGS_FROM_TARGET[@]}" -eq 1 &&
"${INTERPRETER_ARGS_FROM_TARGET[0]}" == "$INTERPRETER_ARGS_SENTINEL" ]]; then
INTERPRETER_ARGS_FROM_TARGET=()
fi
INTERPRETER_ARGS_SENTINEL_UNSUBSTITUTED=%interpreter""_args%
INTERPRETER_ARGS_SENTINEL_EMPTY="__py_interpreter_args_empty_sentinel__"
if [[ "${#INTERPRETER_ARGS_FROM_TARGET[@]}" -eq 1 ]] && \
([[ "${INTERPRETER_ARGS_FROM_TARGET[0]}" == "$INTERPRETER_ARGS_SENTINEL_UNSUBSTITUTED" ]] || \
[[ "${INTERPRETER_ARGS_FROM_TARGET[0]}" == "$INTERPRETER_ARGS_SENTINEL_EMPTY" ]]); then
INTERPRETER_ARGS_FROM_TARGET=()
fi

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also seems off, if a user is intentionally passing the exact sentinel string that would be unfortunate, but probably not worth complicating the fix for?

Defer to maintainers though

The sh_test requires @rules_shell which isn't provided transitively.
Buildifier auto-adds the load statement for rules_shell's sh_test,
so we need the dependency in the standalone WORKSPACE.
Bazel 9 removed native py_binary from the default namespace when
using bzlmod, making this test incompatible. Restrict the test to
Bazel 7.4.1, 8.0.0, and self (current Bazel) where native rules
are still available.
@aignas
Copy link
Collaborator

aignas commented Jan 6, 2026

However, when external repositories (like gRPC's cython) define py_binary using the native rule

As far as I know there is no such thing and the native rule is just something that bazel gives you by autoloading rules_python.

At some point bazel will stop doing this and it will be impossible to call py_binary without loading rules_python or other ruleset.

Could you please explain the setup of gRPC a little more? How are you using rules_python that makes everything break?

EDIT: sorry for the close and reopen - sausage fingers.

@aignas aignas closed this Jan 6, 2026
@aignas aignas reopened this Jan 6, 2026
@shayanhoshyari
Copy link
Contributor

Could you please explain the setup of gRPC a little more? How are you using rules_python that makes everything break?

Or I guess a real life minimal reproducer?

thomasdesr added a commit to thomasdesr/rules_python_3495_repro that referenced this pull request Jan 9, 2026
This demonstrates the regression in rules_python 1.7.0 where
%interpreter_args% template placeholders are not substituted for
external py_binary targets.

Building the same target now fails:

  bazel build @com_github_grpc_grpc//src/python/grpcio/grpc/_cython:cygrpc.pyx_cython_translation

Error:
  File ".../cython_binary", line 39
      %interpreter_args%
      ^
  SyntaxError: invalid syntax

Root cause: PR #3242 introduced new template variables that native
py_binary doesn't substitute.

Fix: bazel-contrib/rules_python#3495
thomasdesr added a commit to thomasdesr/rules_python_3495_repro that referenced this pull request Jan 9, 2026
This demonstrates the regression in rules_python 1.7.0 where
%interpreter_args% template placeholders are not substituted for
external py_binary targets.

Building the same target now fails:

  bazel build @com_github_grpc_grpc//src/python/grpcio/grpc/_cython:cygrpc.pyx_cython_translation

Error:
  File ".../cython_binary", line 39
      %interpreter_args%
      ^
  SyntaxError: invalid syntax

Root cause: PR #3242 introduced new template variables that native
py_binary doesn't substitute.

Fix: bazel-contrib/rules_python#3495
@thomasdesr
Copy link
Author

Sorry if I'm not using the right words to describe it because I agree it is going through the virtual_builtins_bzl path, and I think that's the crux of the issue.

I extracted a pretty minimal repro from the codebase where I saw this getting triggered: https://github.com/thomasdesr/rules_python_3495_repro

If you checkout that repo and try to build anything that depends on gRPC's cython compilation (e.g.@com_github_grpc_grpc//src/python/grpcio/grpc/_cython:cygrpc.pyx_cython_translation), it should repro fairly quickly 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants