Skip to content

Conversation

arrdem
Copy link

@arrdem arrdem commented Mar 31, 2025

It appears that rules_mypy has several issues interacting with MyPy, specifically regarding how Bazel wants to lay out runfiles. The relevant issue here is that MyPy is unaware of the bazel-bin/ output tree, and so as with many other rulesets rules_mypy has to try and bridge that gap.

Today rules_mypy attempts to make generated output files (such as the examples/demo/generated_file_imports case) visible to MyPy by manipulating the $MYPYPATH environment variable so that the search path's merged view of the filesystem happens to lay out files in the appropriate logical structure to ignore the separation of some files into the bazel-bin/ output tree while others remain in the source tree.

Unfortunately the present implementation of constructing the $MYPYPATH is fairly convoluted and produces path entries which may not accord with actual directory structures. This appears to be the root cause of issues such as #74.

This PR attempts to rework how the $MYPYPATH gets built to align it with the desired $PYTHONPATH behavior as defined by imports=[] directives on PyInfo providers of dependencies to produce more obviously correct and clearly defined behavior.

Under this changeset for any import=[] directive external import paths will be honored as-is (assuming that there are no generated external outputs, which is incomplete but presently true of rules_python), while internal import paths will be generated in duplicate; once as a source tree entry and once as a bazel-bin/ entry.

This provides support for most cases of generated source files, appears to mitigate issues such as #74 which appear to be caused by generated files masking internal files or causing caches to be busted and simplifies the machinery somewhat.

Alternatives

  • Use copy_to_bin to place all source files consistently into the bin tree, then just but the bin tree on the source path.
  • Stop trying to put bazel-bin/ tree onto the path at all and use --shadow-file to map entries in the source tree to their real locations in the bazel-bin/ tree.
  • Create a proper action sandbox directory with the expected files copied out of bazel-bin into their logical locations.

Possible refinements

  • Filter the MYPYPATH so that path entries are only added if there's a direct input source which is rooted on the entry
  • Filter input sources so that only .py and .pyi sources are persisted as dependencies into the sandbox; discard data files and other generated output which can confuse the tree.

Testing

Open questions

  • What motivated the py_type_library machinery & file relocation. Is that still required with this changeset? Does this changeset cause regressions there?

It appears that `rules_mypy` has several issues interacting with MyPy,
specifically regarding how Bazel wants to lay out runfiles.

MyPy has a multi-tier path search mechanism, see `mypy/modulefinder.py`
for details. The TLDR is that it has integrated knowledge of how search
paths should be set up defaulting to

    search_paths = SearchPaths(
        python_path=(os.getcwd(),),
        mypy_path=tuple(mypy_path() + options.mypy_path),
        package_path=tuple(sys_path),
        typeshed_path=(),
    )

When MyPy searches for a module, it prefers entries in the MYPYPATH to
entries either in the local source tree or in the virtualenv/PYTHONPATH.

Notionally this should be all well and good. `rules_python` already
handles setting up a `PYTHONPATH`, all we should have to do is copy
import roots that would be `PYTHONPATH` entries under `rules_python`
into the `MYPYPATH`. Ordinarily `rules_python` will do this for us, but
since `rules_mypy` runs MyPy as a tool outside of the Python environment
for the target being checked we have to implement that ourselves. And
this works fine for both simple 3rdparty code and 1stparty code.

The trick is how Bazel treats generated files, which don't land in the
logical source tree but instead land in the `bazel-bin/` tree even
within execroots. Consider the following example

    genrule(
        name = "generate_source",
        outs = ["bar.py"],
        cmd = "echo 'def foo(): print(\"hi\")' > $(OUTS)",
    )

    py_library(
        name = "bar",
        srcs = ["bar.py"],
        imports = ["."],
        visibility = ["//visibility:public"],
    )

Here the `py_library` will have a PyInfo provider which lists the
package as an import root which is all well and good, but the `bar.py`
generated file won't land in the package. It'll land in a parallel
`bazel-bin` tree which isn't a child of the import anymore.

The existence of this parallel tree creates other problems too because
even if the source and generated roots are listed as `MYPYPATH` entries
there's potential for difficult to resolve path ordering problems in the
presence of `__init__.py` files, generated or otherwise.

While this patch is better than the stock `rules_mypy` behavior in that
it eliminates the previous cross product of import roots and import
suffixes, this still isn't quite a correct implementation of the
required logic.

A fully correct implementation likely needs to traverse all the files
required for a given typecheck, identify import paths, generate a
`MYPYPATH` consisting only of source-tree native import paths, and then
traverse all the files creating `--shadow-file` mappings to redirect
MyPy to the generated file's location. This would avoid having
effectively duplicate path entries and the problems that can come along
with it.

Work towards theorempl/rules_mypy#74
@arrdem arrdem requested a review from a team as a code owner March 31, 2025 23:03
@arrdem
Copy link
Author

arrdem commented Apr 1, 2025

The above is all technically correct to my understanding but reflects a reverse engineering of the ruleset and fails to provide a clear statement of the problem at hand. The patch also needs to be reconsidered a bit to be more principled and less of a hack, although it appears to do an adequate job addressing the specific problems at hand.

Will amend when time permits.

@alexeagle
Copy link

@mark-thm could you TAL here? 🙇🏻

Comment on lines 177 to 179
# TODO: can we use `ctx.bin_dir.path` here to cover generated files
# and as a way to skip iterating over depset contents to find generated
# file roots?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the answer according to this PR is yes?

Comment on lines +153 to +156
# FIXME: .pyi files from deoendencies shouldn't need to be collected for
# use as imports since their type stubs are already registered in a
# typecheck output cache, and those caches are inputs. Should be able to
# remove this without effect.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not always true that this ruleset populates a typecheck output cache ... so I'm not sure it's always true that the .pyi files are unnecessary

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My original guess for how to get pyi files imported had to do with ensuring they’re in the actual runfiles — which is why the depset gets built up a few lines below. It’s quite possible this block is unnecessary even though rules_python now provides a way to get these

Comment on lines +163 to +166
# FIXME: This is a hack to achieve an `imports=[]` like effect on the
# PyTypeLibraryInfo, which lacks such a feature. Should just move it
# there and make this part of normal info collection, if the type
# library relocation dance is required at all which it shouldn't be.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the history of py_type_library is to deal with typings (e.g. foo-types for pypi package foo) structure -- the contents of foo-types generally has a root folder foo-types, but for mypy we need to actually shove the .pyi files contained within foo-types in foo. py_type_library does exactly that coercion, and PyTypeLibraryInfo then is exactly the needed path.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide an example of a types package which behaves this way? Per https://peps.python.org/pep-0561/#stub-only-packages and the current documentation this behavior seems at odds with the documented expectations of how stubs should be packaged eve by *-types or *-stubs packages.

@markelliot
Copy link
Collaborator

Broad strokes seems fine? I think it'd be nice to start with a failing test that demonstrates what's broken for you as part of the repo, that way we'd be sure to be able to maintain the fix going forward

@keith
Copy link
Member

keith commented Apr 25, 2025

fwiw this does break our project, but maybe that's expected by re-arranging things

@arrdem
Copy link
Author

arrdem commented Apr 25, 2025

fwiw this does break our project, but maybe that's expected by re-arranging things

This machinery is pretty tricky so that's not surprising but also not desired, are you able to provide any details @keith?

@keith
Copy link
Member

keith commented Apr 25, 2025

i can try to isolate next week, but in our case i've hit lots of issues with this stuff since we have generated files and checked in files in the same py_library, and we have native extensions with pyi files

@brandonchinn178
Copy link
Contributor

I think I do a similar thing in #89 and #90?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants