-
Notifications
You must be signed in to change notification settings - Fork 28
Description
Is your feature request related to a problem? Please describe.
Currently, CLDK's support for analyzing Python source code via codeanalyzer-python
is limited in comparison to the Java capabilities available in codeanalyzer-java
. While the Java backend supports deep semantic extraction including symbol tables, call graphs, control/data flow graphs, CRUD operation detection, and framework-specific analyses, the Python counterpart lacks many of these features. This asymmetry makes it hard to build language-agnostic tooling and limits reuse of the CLDK analysis stack for Python-heavy projects.
Describe the solution you'd like
I’d like CLDK to offer parity between its Java and Python analysis capabilities, ideally exposing the same abstraction surfaces. This includes:
- A fully-featured
PyApplication
schema with semantic depth matchingJApplication
- Robust symbol resolution using
jedi
, LSP, and other tools to handle module/function/class-level scoping, type inference, imports, and dynamic features - Generation of call graphs (direct + transitive), with support for async and dynamic callsites
- Control and data flow graphs at the function and module level
- Basic interprocedural analysis across modules and packages
- CRUD operation detection and annotation in frameworks like Django, SQLAlchemy, FastAPI, etc.
- Support for incremental/eager analysis and caching like the Java analyzer
- Compatibility with existing CLDK pipelines like test generation, transformation, and migration agents
Describe alternatives you've considered
- Wrapping third-party static analysis tools like Pyright, pyre, or mypy — but these often lack consistent and unified schema output
- Rewriting analyses from scratch for Python — not ideal due to duplication and maintenance burden
- Falling back on AST-based pattern matching without semantic resolution — brittle and low precision
Additional context
Reference implementations and design can be found in:
- https://github.com/codellm-devkit/codeanalyzer-java
- https://github.com/codellm-devkit/codeanalyzer-python
The goal is to bring codeanalyzer-python
up to parity with codeanalyzer-java
in CLDK’s analysis infrastructure. This will enable unified workflows across Java and Python ecosystems and allow tool developers to target a shared intermediate representation for static analysis tasks.