Add getLibraryApi and getStdlibApi: extract Python library & stdlib public-API types#19
Open
knutwannheden wants to merge 15 commits into
Open
Add getLibraryApi and getStdlibApi: extract Python library & stdlib public-API types#19knutwannheden wants to merge 15 commits into
knutwannheden wants to merge 15 commits into
Conversation
- Add test_stdlib_multi_module_local_set: verifies that when builtins is in the requested module set, str is a full classLiteral (not classRef) - Add test_stdlib_all_modules_dump: verifies all-stdlib expansion (empty modules param) includes os/sys/collections/builtins with str as classLiteral - Fix needless_lifetimes clippy warning in handle_get_stdlib_api - Document getStdlibApi in CLAUDE.md wire protocol section
initialize gains firstPartyRoot / firstPartyModules; when set, the session registry emits classes outside the boundary as classRef. No boundary fields keep full-expansion behavior unchanged. Reuses the Boundary enum.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
ty-typescurrently exposes per-AST-node type attribution for a single file (getTypes). To build type tables for Python libraries — the Python analogue of how Moderne builds Java type tables by scanning a JAR's class files with ASM intoJavaTypeobjects — we need a different output shape: the public API surface (module-level declarations, including class members), not the inferred type of every expression in every method body.This PR adds two methods:
getLibraryApi— extracts an installed third-party package (a dir insite-packages).getStdlibApi— extracts the standard library (from ty's vendored typeshed) for the project's configured Python version.Classes defined outside the extraction unit (stdlib/typeshed, other distributions, or other stdlib modules) are emitted as a lightweight
classRefrather than fully expanded — mirroring the self-contained type-table model where a referenced-but-not-defined class is aTAG_CLASS_REF. The emitted JSON is consumed Java-side, where the existingTypeTableWriterserializes it totypes.bin; reproducing that format (minimal-perfect-hash index, ZSTD framing, theJavaTypegraph) from Rust was deliberately avoided.Examples
Third-party package (after
initializewith a project/venv root from which the package resolves):{"jsonrpc":"2.0","method":"getLibraryApi","params":{"root":"/path/to/site-packages/mypkg"},"id":2}{ "modules": [ { "name": "mypkg.core", "file": "core.py", "symbols": [ {"name": "Widget", "typeId": 12}, {"name": "make", "typeId": 30} ] } ], "types": { "12": { "kind": "classLiteral", "className": "Widget", "moduleName": "mypkg.core", "members": [ {"name": "size", "typeId": 7} ] }, "7": { "kind": "instance", "className": "int", "classId": 9 }, "9": { "kind": "classRef", "className": "int", "moduleName": "builtins" } } }Standard library —
modulesselects the local unit; everything else (other stdlib modules,builtinswhen not requested) becomes aclassRef. Omitmodulesfor a whole-stdlib dump.{"jsonrpc":"2.0","method":"getStdlibApi","params":{"modules":["os","collections"]},"id":3}In both cases, in-unit classes are full
classLiterals with members; out-of-unit classes collapse to the newclassRefdescriptor (identity only).Summary
getLibraryApi(src/library.rs): walks a package dir for.py/.pyimodules (preferring.pyi), skips underscore-private path components, enumerates each module's public top-level symbols, registers their types.getStdlibApi(src/library.rs): discovers stdlib modules viaty_module_resolver::all_modulesfiltered to the standard-library search path (no filesystem walk — typeshed is vendored), excludes_typeshed. Version comes from theinitializeproject config.__all__when defined; otherwise drops underscore-prefixed names (whattyapplies forfrom x import *). Shared between both methods.src/registry.rs): aBoundaryis eitherUnderRoot(path)(package extraction) orModules(set)(stdlib extraction). Classes outside the boundary emit the newclassRefdescriptor.getTypes/getTypeRegistryare unchanged (no boundary).classRefTypeDescriptorvariant (src/protocol.rs):className+moduleName, maps 1:1 to the type-tableTAG_CLASS_REF.ty-types-2fork now widensdunder_allmodule visibility (via the existingwiden_ty_visibility.shfix-up list) so__all__is reachable; addsty_python_coreas a dependency forglobal_scope.Test plan
cargo test— 38 integration tests pass, including:.pyi;__all__/underscore symbol filtering; in-package class → fullclassLiteral, typeshedint→classRef; cross-module in-package class (sibling import) stays a fullclassLiteral.builtins.str→classRef; multi-module local set (["string","builtins"]) →strfull; whole-stdlib dump (nomodules) includesos/sys/collections/builtins.getTypes/getTypeRegistrytests remain green (boundary unused there; the boundary generalization is a pure refactor).cargo clippy --all-targetsclean for this crateFollow-up: first-party boundary on
getTypesinitializegains optionalfirstPartyRoot/firstPartyModules. When set, the session'sgetTypesregistry emits classes defined outside the first-party boundary asclassRef(identity only) instead of expanding their full member/supertype graph — keeping third-party/stdlib bodies off the parser RPC (they're built separately viagetLibraryApi/getStdlibApi). With neither field,getTypesfully expands as before (backward compatible). Reuses the existingBoundaryenum; per-node expression andcallSignatureattributions are unaffected (they just referenceclassRefids for external types). Covered bytest_gettypes_first_party_boundary_classrefandtest_gettypes_no_boundary_full_expansion.