Skip to content

Feature Request: Feature Parity between Java and Python Analysis #137

@rahlk

Description

@rahlk

Is your feature request related to a problem? Please describe.

Currently, CLDK's support for analyzing Python source code via codeanalyzer-python is limited in comparison to the Java capabilities available in codeanalyzer-java. While the Java backend supports deep semantic extraction including symbol tables, call graphs, control/data flow graphs, CRUD operation detection, and framework-specific analyses, the Python counterpart lacks many of these features. This asymmetry makes it hard to build language-agnostic tooling and limits reuse of the CLDK analysis stack for Python-heavy projects.

Describe the solution you'd like

I’d like CLDK to offer parity between its Java and Python analysis capabilities, ideally exposing the same abstraction surfaces. This includes:

  • A fully-featured PyApplication schema with semantic depth matching JApplication
  • Robust symbol resolution using jedi, LSP, and other tools to handle module/function/class-level scoping, type inference, imports, and dynamic features
  • Generation of call graphs (direct + transitive), with support for async and dynamic callsites
  • Control and data flow graphs at the function and module level
  • Basic interprocedural analysis across modules and packages
  • CRUD operation detection and annotation in frameworks like Django, SQLAlchemy, FastAPI, etc.
  • Support for incremental/eager analysis and caching like the Java analyzer
  • Compatibility with existing CLDK pipelines like test generation, transformation, and migration agents

Describe alternatives you've considered

  • Wrapping third-party static analysis tools like Pyright, pyre, or mypy — but these often lack consistent and unified schema output
  • Rewriting analyses from scratch for Python — not ideal due to duplication and maintenance burden
  • Falling back on AST-based pattern matching without semantic resolution — brittle and low precision

Additional context

Reference implementations and design can be found in:

The goal is to bring codeanalyzer-python up to parity with codeanalyzer-java in CLDK’s analysis infrastructure. This will enable unified workflows across Java and Python ecosystems and allow tool developers to target a shared intermediate representation for static analysis tasks.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions