Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
80 commits
Select commit Hold shift + click to select a range
f480939
Initial Commit
robinmaisch Dec 29, 2023
6ae3fc5
Fixed ReplaceOperation to support swapping elements
robinmaisch Jan 1, 2024
0c6be35
Reorder passes; Fixed Token generation; Extended NodePattern relations
robinmaisch Jan 5, 2024
185b311
Bump CPG dependencies from 8.0.0-alpha-2 to 8.0.0
robinmaisch Jan 5, 2024
7ab9b0f
Add patterns to match one edge of a multi-edge; Introduce 2nd transfo…
robinmaisch Jan 6, 2024
100296a
Major new features:
robinmaisch Feb 29, 2024
4447224
Fixed old tests and some code problems indicated by the IDE
robinmaisch Feb 29, 2024
1b900a8
Merge branch 'jplag:main' into cpg-dev
robinmaisch Feb 29, 2024
722b256
Fix errors from merging JPlag v5.0.0
robinmaisch Feb 29, 2024
4ed18ea
Feature-complete version (before Sonar)
robinmaisch Mar 18, 2024
5536a3f
Apply Spotless
robinmaisch Mar 18, 2024
95fcb9e
Remove CompareFrontendsTest.java
robinmaisch Mar 18, 2024
640e1fa
Fix tests
robinmaisch Mar 18, 2024
eb592f0
Fix tests
robinmaisch Mar 18, 2024
88dd331
Fix JavaDoc
robinmaisch Mar 19, 2024
fc3d079
Add more documentation
robinmaisch Mar 19, 2024
83995bf
Merge remote-tracking branch 'origin/develop' into cpg-dev
robinmaisch Mar 19, 2024
9ad7388
Fix Sonar remarks
robinmaisch Mar 23, 2024
9c25f9f
Add missing files
robinmaisch Mar 23, 2024
b2fc2f1
Fix Regex
robinmaisch Mar 23, 2024
2e1e333
Improve documentation and consistency of type parameter naming
robinmaisch Mar 26, 2024
50b83ea
Revert changes to base code
robinmaisch Apr 2, 2024
1acd991
Revert changes to base code – No.2
robinmaisch Apr 2, 2024
7946823
Add support for Java-CPG module in report viewer
robinmaisch Apr 2, 2024
60427fd
Fix missing comma
robinmaisch Apr 3, 2024
e2b1af6
test branch setup
wsimonw Dec 2, 2025
883ad3b
set of files, initial commit,
wsimonw Dec 15, 2025
66d8068
added SemanticCpgTokenType, adapted TokenizationCpgNodeListener accor…
wsimonw Dec 17, 2025
fa2f377
introduced TokenEquivalenceModels to abstract when two tokens are def…
wsimonw Dec 20, 2025
700ed8a
adapted equivalence model, implemented it into GST and implemented a …
wsimonw Dec 20, 2025
9d9f638
minor bug fix
wsimonw Dec 20, 2025
eb26c1c
another minor bug fix
wsimonw Dec 20, 2025
835cd38
Update Java version from 21 to 25 in workflow
wsimonw Dec 20, 2025
a5e8109
another minor bug fix
wsimonw Dec 20, 2025
f70367c
Merge remote-tracking branch 'origin/cpg-semantics' into cpg-semantics
wsimonw Dec 20, 2025
b18df6f
adapted spotless style
wsimonw Dec 20, 2025
178e620
changed jacoco version
wsimonw Dec 20, 2025
2701af9
changed jacoco version again
wsimonw Dec 20, 2025
9c501ee
changed versions
wsimonw Dec 20, 2025
c9c4375
retry with java 25 and jacoco 08.14
wsimonw Dec 20, 2025
520f73b
changed java version to 25 in all workflows
wsimonw Dec 20, 2025
0713913
switched all versions back to old
wsimonw Dec 20, 2025
2c5571b
switched all versions back to old
wsimonw Dec 20, 2025
40f205f
removed unnamed patterns
wsimonw Dec 20, 2025
efc7585
minor bugfix
wsimonw Dec 20, 2025
5541160
another minor bugfix
wsimonw Dec 20, 2025
95c0cb0
another minor bugfix
wsimonw Dec 20, 2025
d9daf20
small refactoring, fixed spotless
wsimonw Dec 20, 2025
212446b
solved imports error
wsimonw Dec 20, 2025
bdfe157
moved NodeRegistry from kotlin to java
wsimonw Dec 20, 2025
177c702
fixed error in the language options for java cpg
wsimonw Dec 21, 2025
081f13d
fixed error in the equivalence model
wsimonw Dec 21, 2025
42c8a9e
bugfix for updating the nodes semantic vectors
wsimonw Dec 21, 2025
02114c2
added mappings to the SemanticDimensionsMapper, added tests
wsimonw Dec 21, 2025
cddacc5
bugfix in GreedyStringTiling caused by wrong adjusting with TokenEqui…
wsimonw Dec 21, 2025
7a06a91
added loop-dependencies for self dependencies
wsimonw Dec 22, 2025
2458fe4
improved tests to use equivalence model to determine equality
wsimonw Dec 22, 2025
cd98f61
quick fix for loops in the EOG graph
wsimonw Dec 24, 2025
fb21899
added variable dependency modeling, many bugfixes for tokenization an…
wsimonw Jan 10, 2026
58c8879
bugfix to create correct block ends
wsimonw Jan 10, 2026
57b9461
added tests, bugfix
wsimonw Jan 12, 2026
7a6434e
renamed semantic vector to characteristic to create clearly distingui…
wsimonw Jan 12, 2026
707db3f
bugfix to not create tokens for extra generated nodes, added JavaDoc
wsimonw Jan 15, 2026
74dda8e
more bugfixes
wsimonw Jan 15, 2026
f6dd945
minor style fix
wsimonw Jan 15, 2026
a2a3528
minor style improvement
wsimonw Jan 15, 2026
1464cc1
bugfixes in tokenization and NodeOrderStrategy to skip nodes added by…
wsimonw Jan 19, 2026
6262661
minor fix
wsimonw Jan 19, 2026
f563724
added check for references to variables from other files, style impro…
wsimonw Jan 21, 2026
684eace
adjustment to specify the NodeOrderStrategy for if statements
wsimonw Jan 26, 2026
c1f49de
merging, solved merge conflicts
wsimonw Jan 26, 2026
1d47613
spotless checkstyle
wsimonw Jan 26, 2026
37ecef5
fixed tests, especially refactored versions in aggregator pom xml
wsimonw Feb 6, 2026
ae3b9cc
fixed spotless style
wsimonw Feb 6, 2026
73c4dc8
solved dependency issue for coverage report
wsimonw Feb 7, 2026
4f5d165
style improvement to fulfill check style especially removed wildcard …
wsimonw Feb 9, 2026
cba7556
spotless improvement, added documentation
wsimonw Feb 9, 2026
a6c5aca
adapted Markdown
wsimonw Feb 9, 2026
2c6c0f5
specified comment
wsimonw Feb 10, 2026
9609f23
style improvement regarding SonarCube Finds
wsimonw Feb 10, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ JPlag is released on [Maven Central](https://search.maven.org/search?q=de.jplag)
</dependency>
```

### Building from sources
### Building from sources
1. Download or clone the code from this repository.
2. Run `mvn clean package` from the repository root to compile and build all submodules.
Run `mvn clean package assembly:single` instead if you need the full jar, which includes all dependencies.
Expand Down Expand Up @@ -107,7 +107,7 @@ Parameter descriptions:
Root-directories with submissions to check for
plagiarism (same as root).
--normalize Activate the normalization of tokens. Supported for
languages: Java, C++.
languages: Java, Java-CPG, C++.
-old, --old=<oldDirectories>[,<oldDirectories>...]
Root-directories with prior submissions to compare
against.
Expand Down Expand Up @@ -238,4 +238,4 @@ Please consider our [guidelines for contributions](https://github.com/jplag/JPla
## Contact
If you encounter bugs or other issues, please report them [here](https://github.com/jplag/jplag/issues).
For other purposes, you can contact us at jplag@ipd.kit.edu.
We would love to hear about your research related to JPlag. Feel free to contact us!
We would love to hear about your research related to JPlag. Feel free to contact us!
4 changes: 4 additions & 0 deletions cli/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,10 @@
<groupId>de.jplag</groupId>
<artifactId>java</artifactId>
</dependency>
<dependency>
<groupId>de.jplag</groupId>
<artifactId>java-cpg</artifactId>
</dependency>
<dependency>
<groupId>de.jplag</groupId>
<artifactId>python-3</artifactId>
Expand Down
2 changes: 1 addition & 1 deletion cli/src/main/java/de/jplag/cli/options/CliOptions.java
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ public class CliOptions implements Runnable {
public JPlagMode mode = JPlagMode.AUTO;

/** Enable token normalization (Java, C++). */
@Option(names = {"--normalize"}, description = "Activate the normalization of tokens. Supported for languages: Java, C++.")
@Option(names = {"--normalize"}, description = "Activate the normalization of tokens. Supported for languages: Java, Java-CPG, C++.")
public boolean normalize = false;

/** Advanced options group. */
Expand Down
23 changes: 17 additions & 6 deletions core/src/main/java/de/jplag/comparison/GreedyStringTiling.java
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
import de.jplag.Match;
import de.jplag.Submission;
import de.jplag.Token;
import de.jplag.TokenEquivalenceModel;
import de.jplag.options.JPlagOptions;

/**
Expand All @@ -31,9 +32,10 @@ public class GreedyStringTiling {
private final Map<Submission, RollingTokenHashTable> cachedHashLookupTables = Collections.synchronizedMap(new IdentityHashMap<>());

private final TokenSequenceMapper tokenSequenceMapper;
private final TokenEquivalenceModel tokenEquivalenceModel;

/**
* Creates a instance of the Greedy String Tiling algorithm.
* Creates an instance of the Greedy String Tiling algorithm.
* @param options are the options, controlling algorithm parameters like minimum token match.
* @param tokenValueMapper provides integer mappings for token sequences.
*/
Expand All @@ -43,6 +45,7 @@ public GreedyStringTiling(JPlagOptions options, TokenSequenceMapper tokenValueMa
int minimumNeighborLength = Math.clamp(options.mergingOptions().minimumNeighborLength(), 1, options.minimumTokenMatch());

this.minimumMatchLength = options.mergingOptions().enabled() ? minimumNeighborLength : options.minimumTokenMatch();
this.tokenEquivalenceModel = options.language().getTokenEquivalenceModel();

this.tokenSequenceMapper = tokenValueMapper;
}
Expand Down Expand Up @@ -109,6 +112,10 @@ private JPlagComparison compareOrdered(Submission leftSubmission, Submission rig
int[] leftTokens = this.tokenSequenceMapper.getTokenSequenceFor(leftSubmission);
int[] rightTokens = this.tokenSequenceMapper.getTokenSequenceFor(rightSubmission);

if (!tokenEquivalenceModel.ensureTokenType(leftSubmission.getTokenList())
|| !tokenEquivalenceModel.ensureTokenType(rightSubmission.getTokenList())) {
throw new IllegalStateException("Token equivalence model requires specific token types, but they are not given.");
}
boolean[] leftExcludedTokens = calculateExcludedTokens(leftSubmission);
boolean[] rightExcludedTokens = calculateExcludedTokens(rightSubmission);

Expand All @@ -134,7 +141,7 @@ private JPlagComparison compareOrdered(Submission leftSubmission, Submission rig
}

int subsequenceMatchLength = findLongestUnmarkedMatch(leftTokens, leftStartIndex, leftExcludedTokens, rightTokens,
rightStartIndex, rightExcludedTokens, maximumMatchLength);
rightStartIndex, rightExcludedTokens, maximumMatchLength, leftSubmission.getTokenList(), rightSubmission.getTokenList());
if (subsequenceMatchLength >= maximumMatchLength) {
if (subsequenceMatchLength > maximumMatchLength) {
iterationMatches.clear();
Expand Down Expand Up @@ -177,17 +184,21 @@ private JPlagComparison compareOrdered(Submission leftSubmission, Submission rig
* length.
*/
private int findLongestUnmarkedMatch(int[] leftValues, int leftStartIndex, boolean[] leftMarked, int[] rightValues, int rightStartIndex,
boolean[] rightMarked, int minimumSequenceLength) {
boolean[] rightMarked, int minimumSequenceLength, List<Token> leftTokens, List<Token> rightTokens) {
for (int offset = minimumSequenceLength - 1; offset >= 0; offset--) {
int leftIndex = leftStartIndex + offset;
int rightIndex = rightStartIndex + offset;
if (leftValues[leftIndex] != rightValues[rightIndex] || leftMarked[leftIndex] || rightMarked[rightIndex]) {
if (!tokenEquivalenceModel.arePrimaryEquivalent(leftValues[leftIndex], rightValues[rightIndex])
|| !tokenEquivalenceModel.areSecondaryEquivalent(leftTokens.get(leftIndex).getType(), rightTokens.get(rightIndex).getType())
|| leftMarked[leftIndex] || rightMarked[rightIndex]) {
return 0;
}
}
int offset = minimumSequenceLength;
while (leftValues[leftStartIndex + offset] == rightValues[rightStartIndex + offset] && !leftMarked[leftStartIndex + offset]
&& !rightMarked[rightStartIndex + offset]) {
while (tokenEquivalenceModel.arePrimaryEquivalent(leftValues[leftStartIndex + offset], rightValues[rightStartIndex + offset])
&& tokenEquivalenceModel.areSecondaryEquivalent(leftTokens.get(leftStartIndex + offset).getType(),
rightTokens.get(rightStartIndex + offset).getType())
&& !leftMarked[leftStartIndex + offset] && !rightMarked[rightStartIndex + offset]) {
offset++;
}
return offset;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@ public JPlagResult compareSubmissions(SubmissionSet submissionSet) throws Compar
long startTimeMillis = System.currentTimeMillis();

// Set up data structures:
TokenSequenceMapper tokenSequenceMapper = new TokenSequenceMapper(submissionSet);
TokenSequenceMapper tokenSequenceMapper = new TokenSequenceMapper(options.language().getTokenEquivalenceModel(), submissionSet);
GreedyStringTiling coreAlgorithm = new GreedyStringTiling(options, tokenSequenceMapper);

// Prepare base code comparisons:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
import de.jplag.Submission;
import de.jplag.SubmissionSet;
import de.jplag.Token;
import de.jplag.TokenEquivalenceModel;
import de.jplag.TokenType;
import de.jplag.logging.ProgressBarLogger;
import de.jplag.logging.ProgressBarType;
Expand All @@ -21,13 +22,16 @@
public class TokenSequenceMapper {
private final Map<TokenType, Integer> tokenTypeToId;
private final Map<Submission, int[]> submissionToTokenSequence;
private final TokenEquivalenceModel tokenEquivalenceModel;

/**
* Creates the submission to token ID mapping for a set of submissions. This will also show the progress to the user
* using the {@link ProgressBarLogger}.
* @param tokenEquivalenceModel the model to use for token type equivalence.
* @param submissionSet is the set of submissions to process.
*/
public TokenSequenceMapper(SubmissionSet submissionSet) {
public TokenSequenceMapper(TokenEquivalenceModel tokenEquivalenceModel, SubmissionSet submissionSet) {
this.tokenEquivalenceModel = tokenEquivalenceModel;
tokenTypeToId = new HashMap<>();
submissionToTokenSequence = new IdentityHashMap<>();

Expand All @@ -47,7 +51,7 @@ private void addSingleSubmission(Submission submission) {
List<Token> tokens = submission.getTokenList();
int[] tokenSequence = new int[tokens.size()];
for (int i = 0; i < tokens.size(); i++) {
TokenType type = tokens.get(i).getType();
TokenType type = tokenEquivalenceModel.getPrimaryType(tokens.get(i));
tokenTypeToId.putIfAbsent(type, tokenTypeToId.size());
tokenSequence[i] = tokenTypeToId.get(type);
}
Expand Down
4 changes: 4 additions & 0 deletions coverage-report/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,10 @@
<groupId>de.jplag</groupId>
<artifactId>multi-language</artifactId>
</dependency>
<dependency>
<groupId>de.jplag</groupId>
<artifactId>java-cpg</artifactId>
</dependency>
</dependencies>
<build>
<plugins>
Expand Down
12 changes: 6 additions & 6 deletions docs/1.-How-to-Use-JPlag.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ Parameter descriptions:
Root-directories with submissions to check for
plagiarism (same as root).
--normalize Activate the normalization of tokens. Supported for
languages: Java, C++.
languages: Java, Java-CPG, C++.
-old, --old=<oldDirectories>[,<oldDirectories>...]
Root-directories with prior submissions to compare
against.
Expand Down Expand Up @@ -251,7 +251,7 @@ The base code is a special kind of submission. It is the template on which all o
└── Solution.java
```

In this example, students must solve a problem by implementing the `run` method in the template below. Because they are not supposed to modify the `main` function, it will be identical for each student.
In this example, students must solve a problem by implementing the `run` method in the template below. Because they are not supposed to modify the `main` function, it will be identical for each student.

```java
// BaseCode/Solution.java
Expand All @@ -269,14 +269,14 @@ public class Solution {
}
```

To prevent JPlag from detecting similarities in the `main` function (and other parts of the template), we can instruct JPlag to ignore matches with the given base code by providing the `-bc=<base-code-name>` option.
To prevent JPlag from detecting similarities in the `main` function (and other parts of the template), we can instruct JPlag to ignore matches with the given base code by providing the `-bc=<base-code-name>` option.
The `<base-code-name>` in the example above is `BaseCode`.

### Multiple Root Directories
* You can run JPlag with multiple root directories; JPlag compares submissions from all of them
* JPlag distinguishes between old and new root directories
** Submissions in new root directories are checked amongst themselves and against submissions from other root directories
** Submissions in old root directories are only checked against submissions from other new root directories
** Submissions in new root directories are checked amongst themselves and against submissions from other root directories
** Submissions in old root directories are only checked against submissions from other new root directories
* You need at least one (new) root directory to run JPlag

This allows you to check submissions against those of previous years:
Expand Down Expand Up @@ -307,4 +307,4 @@ classDiagram
Directory --> "1..*" File : contains
Submission <|-- File : is a
Submission <|-- Directory : is a
```
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
package de.jplag;

/**
* The default token equivalence model that can be used by most languages. It assumes tokens are only equivalent if they
* have the same type and contain no additional data.
*/
public class DefaultTokenEquivalenceModel implements TokenEquivalenceModel {

@Override
public TokenType getPrimaryType(Token token) {
return token.getType();
}

@Override
public boolean arePrimaryEquivalent(int leftValue, int rightValue) {
return leftValue == rightValue;
}
}
8 changes: 8 additions & 0 deletions language-api/src/main/java/de/jplag/Language.java
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,14 @@ default boolean requiresCoreNormalization() {
return true;
}

/**
* @return The token equivalence model to use for this language. Override this method if you need a custom token
* equivalence model.
*/
default TokenEquivalenceModel getTokenEquivalenceModel() {
return new DefaultTokenEquivalenceModel();
}

/**
* @return True, if the language module can be used by the multi-language module
*/
Expand Down
3 changes: 2 additions & 1 deletion language-api/src/main/java/de/jplag/Token.java
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,8 @@ public class Token {
private final int endColumn;
private final File file;
private final TokenType type;
private CodeSemantics semantics; // value null if no semantics
private CodeSemantics semantics; // value null if no semantics, maybe move into tokentype since information about the tokens information not bout
// the position?

/**
* Creates a token with column and length information.
Expand Down
49 changes: 49 additions & 0 deletions language-api/src/main/java/de/jplag/TokenEquivalenceModel.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
package de.jplag;

import java.util.List;

/**
* Defines an interface for when tokens are considered equivalent. This is used to determine matches between tokens by
* using a two step approach: First, the primary types of the tokens are compared using
* {@link #arePrimaryEquivalent(int, int)}. If they are considered equivalent, the secondary types are compared using
* {@link #areSecondaryEquivalent(TokenType, TokenType)}.
*/
public interface TokenEquivalenceModel {

/**
* Gets the primary {@link TokenType} of a token.
* @param token The token
* @return The primary type
*/
TokenType getPrimaryType(Token token);

/**
* Ensures that the tokens have the correct type assigned. By default, this method does nothing and returns true.
* @param tokens The tokens
* @return True, if the types are ensured
*/
default boolean ensureTokenType(List<Token> tokens) {
return true;
}

/**
* Determines whether two tokens are primary equivalent based on their int representation. Uses an int representation of
* the token types for performance reasons.
* @param leftValue the left token value
* @param rightValue the right token value
* @return True, if the primary token values are equivalent
*/
boolean arePrimaryEquivalent(int leftValue, int rightValue);

/**
* Determines whether two tokens are secondary equivalent based on their TokenType representation. By default, this
* method returns true.
* @param leftType the left token type
* @param rightType the right token type
* @return True, if the secondary token types are equivalent
*/
default boolean areSecondaryEquivalent(TokenType leftType, TokenType rightType) {
return true;
}

}
Loading