-
Notifications
You must be signed in to change notification settings - Fork 5
Improve $MYPYPATH
behavior
#75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
It appears that `rules_mypy` has several issues interacting with MyPy, specifically regarding how Bazel wants to lay out runfiles. MyPy has a multi-tier path search mechanism, see `mypy/modulefinder.py` for details. The TLDR is that it has integrated knowledge of how search paths should be set up defaulting to search_paths = SearchPaths( python_path=(os.getcwd(),), mypy_path=tuple(mypy_path() + options.mypy_path), package_path=tuple(sys_path), typeshed_path=(), ) When MyPy searches for a module, it prefers entries in the MYPYPATH to entries either in the local source tree or in the virtualenv/PYTHONPATH. Notionally this should be all well and good. `rules_python` already handles setting up a `PYTHONPATH`, all we should have to do is copy import roots that would be `PYTHONPATH` entries under `rules_python` into the `MYPYPATH`. Ordinarily `rules_python` will do this for us, but since `rules_mypy` runs MyPy as a tool outside of the Python environment for the target being checked we have to implement that ourselves. And this works fine for both simple 3rdparty code and 1stparty code. The trick is how Bazel treats generated files, which don't land in the logical source tree but instead land in the `bazel-bin/` tree even within execroots. Consider the following example genrule( name = "generate_source", outs = ["bar.py"], cmd = "echo 'def foo(): print(\"hi\")' > $(OUTS)", ) py_library( name = "bar", srcs = ["bar.py"], imports = ["."], visibility = ["//visibility:public"], ) Here the `py_library` will have a PyInfo provider which lists the package as an import root which is all well and good, but the `bar.py` generated file won't land in the package. It'll land in a parallel `bazel-bin` tree which isn't a child of the import anymore. The existence of this parallel tree creates other problems too because even if the source and generated roots are listed as `MYPYPATH` entries there's potential for difficult to resolve path ordering problems in the presence of `__init__.py` files, generated or otherwise. While this patch is better than the stock `rules_mypy` behavior in that it eliminates the previous cross product of import roots and import suffixes, this still isn't quite a correct implementation of the required logic. A fully correct implementation likely needs to traverse all the files required for a given typecheck, identify import paths, generate a `MYPYPATH` consisting only of source-tree native import paths, and then traverse all the files creating `--shadow-file` mappings to redirect MyPy to the generated file's location. This would avoid having effectively duplicate path entries and the problems that can come along with it. Work towards theorempl/rules_mypy#74
The above is all technically correct to my understanding but reflects a reverse engineering of the ruleset and fails to provide a clear statement of the problem at hand. The patch also needs to be reconsidered a bit to be more principled and less of a hack, although it appears to do an adequate job addressing the specific problems at hand. Will amend when time permits. |
@mark-thm could you TAL here? 🙇🏻 |
# TODO: can we use `ctx.bin_dir.path` here to cover generated files | ||
# and as a way to skip iterating over depset contents to find generated | ||
# file roots? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the answer according to this PR is yes?
# FIXME: .pyi files from deoendencies shouldn't need to be collected for | ||
# use as imports since their type stubs are already registered in a | ||
# typecheck output cache, and those caches are inputs. Should be able to | ||
# remove this without effect. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not always true that this ruleset populates a typecheck output cache ... so I'm not sure it's always true that the .pyi files are unnecessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My original guess for how to get pyi files imported had to do with ensuring they’re in the actual runfiles — which is why the depset gets built up a few lines below. It’s quite possible this block is unnecessary even though rules_python now provides a way to get these
# FIXME: This is a hack to achieve an `imports=[]` like effect on the | ||
# PyTypeLibraryInfo, which lacks such a feature. Should just move it | ||
# there and make this part of normal info collection, if the type | ||
# library relocation dance is required at all which it shouldn't be. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the history of py_type_library
is to deal with typings (e.g. foo-types
for pypi package foo
) structure -- the contents of foo-types
generally has a root folder foo-types
, but for mypy we need to actually shove the .pyi
files contained within foo-types
in foo
. py_type_library
does exactly that coercion, and PyTypeLibraryInfo
then is exactly the needed path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you provide an example of a types package which behaves this way? Per https://peps.python.org/pep-0561/#stub-only-packages and the current documentation this behavior seems at odds with the documented expectations of how stubs should be packaged eve by *-types
or *-stubs
packages.
Broad strokes seems fine? I think it'd be nice to start with a failing test that demonstrates what's broken for you as part of the repo, that way we'd be sure to be able to maintain the fix going forward |
fwiw this does break our project, but maybe that's expected by re-arranging things |
This machinery is pretty tricky so that's not surprising but also not desired, are you able to provide any details @keith? |
i can try to isolate next week, but in our case i've hit lots of issues with this stuff since we have generated files and checked in files in the same py_library, and we have native extensions with pyi files |
It appears that
rules_mypy
has several issues interacting with MyPy, specifically regarding how Bazel wants to lay out runfiles. The relevant issue here is that MyPy is unaware of thebazel-bin/
output tree, and so as with many other rulesetsrules_mypy
has to try and bridge that gap.Today
rules_mypy
attempts to make generated output files (such as theexamples/demo/generated_file_imports
case) visible to MyPy by manipulating the$MYPYPATH
environment variable so that the search path's merged view of the filesystem happens to lay out files in the appropriate logical structure to ignore the separation of some files into thebazel-bin/
output tree while others remain in the source tree.Unfortunately the present implementation of constructing the
$MYPYPATH
is fairly convoluted and produces path entries which may not accord with actual directory structures. This appears to be the root cause of issues such as #74.This PR attempts to rework how the
$MYPYPATH
gets built to align it with the desired$PYTHONPATH
behavior as defined byimports=[]
directives onPyInfo
providers of dependencies to produce more obviously correct and clearly defined behavior.Under this changeset for any
import=[]
directive external import paths will be honored as-is (assuming that there are no generated external outputs, which is incomplete but presently true ofrules_python
), while internal import paths will be generated in duplicate; once as a source tree entry and once as abazel-bin/
entry.This provides support for most cases of generated source files, appears to mitigate issues such as #74 which appear to be caused by generated files masking internal files or causing caches to be busted and simplifies the machinery somewhat.
Alternatives
bazel-bin/
tree onto the path at all and use--shadow-file
to map entries in the source tree to their real locations in thebazel-bin/
tree.bazel-bin
into their logical locations.Possible refinements
MYPYPATH
so that path entries are only added if there's a direct input source which is rooted on the entry.py
and.pyi
sources are persisted as dependencies into the sandbox; discard data files and other generated output which can confuse the tree.Testing
py_binary
deps and__init__
reexports #74 repro in private repo passesOpen questions
py_type_library
machinery & file relocation. Is that still required with this changeset? Does this changeset cause regressions there?