Skip to content

Conversation

@LundiNord
Copy link

@LundiNord LundiNord commented Dec 24, 2025

This PR builds upon #2767.

This PR adds abstract interpretation on Java source code to the java-cpg language module.
This is used to detect dead code that is hard/impossible to detect using static analysis. The dead code is removed from the tokens to prevent attacks where dead code is inserted to mask plagiarism.

robinmaisch and others added 30 commits December 29, 2023 11:55
Content:
 - A new CPG language frontend for JPlag
 - An interface to transform submissions into CPGs
 - An interface to transform CPGs into token lists
 - A Graph Transformation Engine (to be extended)
   . interfaces representing node and graph patterns, matches of these patterns, transformations
   . an isomorphism detector
   . a transformation algorithm
 - Some graph transformations (to be extended)
- implemented multi-root graph patterns
- implemented searching for "all matches at once"
- GraphOperations should leave EOG intact
- implemented new kinds of edges and properties for graph patterns
- tokenization works well
- implemented DFG sort pass
-- this requires specialized treatment for all kinds of language features. Surely the considered feature set is incomplete.
It was designed for local use.
Add many comments and put a file in the 'passes' package. When JavaDoc cannot find a Java file in there, it quits.
@LundiNord LundiNord changed the title BA: "Preventing Advanced Dead Code Attacks on Source Code Plagiarism Detection via Abstract Interpretation" Code Review Add a graph-based Java module with syntax normalization and dead code detection. Jan 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants