Detrace is a static analysis tool for measuring code similarity in C programs. It is designed for post-submission evaluation, particularly in environments where time constraints limit code variation. The tool identifies overlap by analyzing the abstract syntax tree of each submission. It outputs a similarity score reflecting the degree of structural resemblance between programs.
• AST-based Comparison for analyzing structural similarity between C programs.
• Code Abstraction through identifier removal, empty construct elimination, and function reordering tolerance.
• Similarity Scoring using subtree-level comparison to generate a percentage match.
• One-to-many Comparison to efficiently evaluate a single submission against multiple peers.
• Multithreaded Execution (experimental) for parallel similarity computation.
Detrace starts by tokenizing the input C code using a custom lexer implemented as a finite state machine.
The token stream is parsed using a YACC-based grammar with rich semantic actions to build a parse tree.
This tree is then simplified by abstracting away identifiers and function names, and by removing empty loops and blocks to generate the Abstract Syntax Tree (AST).
Function reordering is also handled to normalize superficial differences in code layout.
Each subtree in the AST is hashed, and the hashes are floated upward to form cumulative representations.
Subtrees are grouped by size, allowing structurally significant matches to be weighted more heavily.
A similarity score is computed based on the number and size of matching subtrees between submissions.
The score reflects how structurally alike two programs are, rather than relying on textual overlap.
The process supports one-to-many comparisons, with multithreading integrated for experimental scalability.
- Clone repository and open directory.
- Run
makecommand. - Run the command:
./detrace <source_file> <target_file_1> <target_file_2> ... <target_file_n> --m (for multithreading; optional)