-
Notifications
You must be signed in to change notification settings - Fork 359
Add Tree-Sitter-based Language Module for Python #2548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: feature/tree-sitter-parser-integration
Are you sure you want to change the base?
Add Tree-Sitter-based Language Module for Python #2548
Conversation
Convert to abstract class with handler maps and template method pattern. Reduces boilerplate and enforces consistent visitor implementations.
The "Adapter" suffix was misleading as these classes are direct parsers, not adapters between incompatible interfaces. Updated all references and documentation to reflect the cleaner, more accurate naming.
Removed redundant token list initialization from PythonTokenCollector and added a centralized token list in TreeSitterVisitor. Introduced a method to retrieve collected tokens.
Updated the language module documentation to clarify core components and added sections for ANTLR and Tree-sitter parsing technologies. Included detailed examples for setting up language modules with Tree-sitter, emphasizing its implementation specifics.
languages/python/src/main/java/de/jplag/python/PythonTokenCollector.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds comprehensive Tree-sitter-based Python language support to JPlag, providing modern Python syntax support (3.10+ match statements, 3.11+ exception groups, 3.12+ type aliases) through native parsing performance and community-maintained grammars.
- Implements new
language-tree-sitter-utilsmodule with abstract base classes and native library management - Adds Tree-sitter Python language module with token extraction for all Python constructs
- Updates Java compiler target from JDK 21 to JDK 22 for Foreign Function and Memory API support
Reviewed Changes
Copilot reviewed 52 out of 53 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
scripts/build-native-libraries.sh |
Cross-platform native library build script for Tree-sitter grammars |
pom.xml |
Updates Java version to 22 and adds Tree-sitter dependencies and build profile |
language-tree-sitter-utils/ |
New module with abstract base classes for Tree-sitter language implementations |
languages/python/ |
Complete Tree-sitter Python implementation with parser, token collector, and tests |
| CI workflows | Updates Java version and adds native library build steps |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
languages/java/src/test/java/de/jplag/java/JavaLanguageTest.java
Outdated
Show resolved
Hide resolved
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
StringTemplates have been introduced in Java 21 but removed in 22.
|
I'm interested in this PR, so I tried building it, however the build failed with errors along the lines of DetailsI was able to successfully build both It seems that the |
|
@ruro Thanks for your interest in this PR. Unfortunately, I couldn't reproduce this when I cloned the fork again. However, Tree-sitter requires JDK 22 and jextract as it uses Java's newly introduced Foreign Function & Memory API (JEP 454) to link the native functions of the Tree-sitter C libraries. Did you install both beforehand? Feel free to look at #2370 if you haven't already. You'll find a short installation instruction for jextract in the comments. Let me know if that solved the build issue for you. |
JPlag's current Python support relies on ANTLR grammars that struggle with modern Python syntax (3.10+
matchstatements, 3.11+ exception groups, 3.12+typealiases) and require maintaining custom grammars that lag behind language evolution.This PR introduces Tree-sitter as JPlag's new parsing foundation, starting with Python. Tree-sitter provides native parsing performance, community-maintained grammars that stay current with language specs, and cross-platform native library distribution.
Technical Notes
Java JDK 22for Java's Foreign Function and Memory (FFM) API andjextractto generate Java bindings from Tree-sitter C headersZigbuild support in Tree-sitter repositories for Windows library compilationArchitecture Changes
New
language-tree-sitter-utilsmoduleFirst Tree-sitter Python language module implementation
PythonParser: Orchestrates parsing and delegates to token collectorPythonTokenCollector: AST visitor that maps Tree-sitter nodes to JPlag tokensTreeSitterPython: Singleton grammar loader using FFM APINative library build system
mvn -Pbuild-native-libraries generate-resourcesTesting
Related