Skip to content

Add getLibraryApi and getStdlibApi: extract Python library & stdlib public-API types#19

Open
knutwannheden wants to merge 15 commits into
mainfrom
frothy-fox
Open

Add getLibraryApi and getStdlibApi: extract Python library & stdlib public-API types#19
knutwannheden wants to merge 15 commits into
mainfrom
frothy-fox

Conversation

@knutwannheden

@knutwannheden knutwannheden commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Motivation

ty-types currently exposes per-AST-node type attribution for a single file (getTypes). To build type tables for Python libraries — the Python analogue of how Moderne builds Java type tables by scanning a JAR's class files with ASM into JavaType objects — we need a different output shape: the public API surface (module-level declarations, including class members), not the inferred type of every expression in every method body.

This PR adds two methods:

  • getLibraryApi — extracts an installed third-party package (a dir in site-packages).
  • getStdlibApi — extracts the standard library (from ty's vendored typeshed) for the project's configured Python version.

Classes defined outside the extraction unit (stdlib/typeshed, other distributions, or other stdlib modules) are emitted as a lightweight classRef rather than fully expanded — mirroring the self-contained type-table model where a referenced-but-not-defined class is a TAG_CLASS_REF. The emitted JSON is consumed Java-side, where the existing TypeTableWriter serializes it to types.bin; reproducing that format (minimal-perfect-hash index, ZSTD framing, the JavaType graph) from Rust was deliberately avoided.

Examples

Third-party package (after initialize with a project/venv root from which the package resolves):

{"jsonrpc":"2.0","method":"getLibraryApi","params":{"root":"/path/to/site-packages/mypkg"},"id":2}
{
  "modules": [
    { "name": "mypkg.core", "file": "core.py",
      "symbols": [ {"name": "Widget", "typeId": 12}, {"name": "make", "typeId": 30} ] }
  ],
  "types": {
    "12": { "kind": "classLiteral", "className": "Widget", "moduleName": "mypkg.core",
            "members": [ {"name": "size", "typeId": 7} ] },
    "7":  { "kind": "instance", "className": "int", "classId": 9 },
    "9":  { "kind": "classRef", "className": "int", "moduleName": "builtins" }
  }
}

Standard librarymodules selects the local unit; everything else (other stdlib modules, builtins when not requested) becomes a classRef. Omit modules for a whole-stdlib dump.

{"jsonrpc":"2.0","method":"getStdlibApi","params":{"modules":["os","collections"]},"id":3}

In both cases, in-unit classes are full classLiterals with members; out-of-unit classes collapse to the new classRef descriptor (identity only).

Summary

  • getLibraryApi (src/library.rs): walks a package dir for .py/.pyi modules (preferring .pyi), skips underscore-private path components, enumerates each module's public top-level symbols, registers their types.
  • getStdlibApi (src/library.rs): discovers stdlib modules via ty_module_resolver::all_modules filtered to the standard-library search path (no filesystem walk — typeshed is vendored), excludes _typeshed. Version comes from the initialize project config.
  • Public-symbol filtering: respects __all__ when defined; otherwise drops underscore-prefixed names (what ty applies for from x import *). Shared between both methods.
  • Boundary-aware registry (src/registry.rs): a Boundary is either UnderRoot(path) (package extraction) or Modules(set) (stdlib extraction). Classes outside the boundary emit the new classRef descriptor. getTypes/getTypeRegistry are unchanged (no boundary).
  • New classRef TypeDescriptor variant (src/protocol.rs): className + moduleName, maps 1:1 to the type-table TAG_CLASS_REF.
  • ruff submodule bump: the ty-types-2 fork now widens dunder_all module visibility (via the existing widen_ty_visibility.sh fix-up list) so __all__ is reachable; adds ty_python_core as a dependency for global_scope.

Test plan

  • cargo test — 38 integration tests pass, including:
    • getLibraryApi: lists modules + class symbols; excludes underscore-private modules/packages; prefers .pyi; __all__/underscore symbol filtering; in-package class → full classLiteral, typeshed intclassRef; cross-module in-package class (sibling import) stays a full classLiteral.
    • getStdlibApi: single requested module → its classes full, referenced builtins.strclassRef; multi-module local set (["string","builtins"]) → str full; whole-stdlib dump (no modules) includes os/sys/collections/builtins.
  • Existing getTypes/getTypeRegistry tests remain green (boundary unused there; the boundary generalization is a pure refactor).
  • cargo clippy --all-targets clean for this crate

Follow-up: first-party boundary on getTypes

initialize gains optional firstPartyRoot / firstPartyModules. When set, the session's getTypes registry emits classes defined outside the first-party boundary as classRef (identity only) instead of expanding their full member/supertype graph — keeping third-party/stdlib bodies off the parser RPC (they're built separately via getLibraryApi/getStdlibApi). With neither field, getTypes fully expands as before (backward compatible). Reuses the existing Boundary enum; per-node expression and callSignature attributions are unaffected (they just reference classRef ids for external types). Covered by test_gettypes_first_party_boundary_classref and test_gettypes_no_boundary_full_expansion.

- Add test_stdlib_multi_module_local_set: verifies that when builtins is
  in the requested module set, str is a full classLiteral (not classRef)
- Add test_stdlib_all_modules_dump: verifies all-stdlib expansion (empty
  modules param) includes os/sys/collections/builtins with str as classLiteral
- Fix needless_lifetimes clippy warning in handle_get_stdlib_api
- Document getStdlibApi in CLAUDE.md wire protocol section
@knutwannheden knutwannheden changed the title Add getLibraryApi: extract a Python library's public-API types Add getLibraryApi and getStdlibApi: extract Python library & stdlib public-API types Jun 12, 2026
initialize gains firstPartyRoot / firstPartyModules; when set, the session
registry emits classes outside the boundary as classRef. No boundary fields
keep full-expansion behavior unchanged. Reuses the Boundary enum.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

1 participant