-
-
Notifications
You must be signed in to change notification settings - Fork 3k
Refchecker Logic Functionality #14213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
In my implementation of DuplicateCheck, should I ignore any non-standard fields? Would that be preferable? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, good work so far. I am leaving some comments.
| * @return number [0,1] 1 representing the same (one potentially having more fields), 0 representing completely different | ||
| */ | ||
| public double degreeOfSimilarity(final BibEntry one, final BibEntry two) { | ||
| return one.getFields((f) -> two.getField(f).isPresent()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid using abbreviations
| .stream().mapToDouble((field) -> { | ||
| String first = one.getField(field).get(); | ||
| String second = two.getField(field).get(); | ||
| return new StringSimilarity().similarity(first, second); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't create a new StringSimilarity instance for each field
| public RefChecker( | ||
| DoiFetcher doiFetcher, | ||
| ArXivFetcher arXivFetcher) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only two parameters, so can be in a single line. Use one in each line when they are >=4
| public RefChecker( | ||
| DoiFetcher doiFetcher, | ||
| ArXivFetcher arXivFetcher) { | ||
| this(doiFetcher, arXivFetcher, new CrossRef(), new DuplicateCheck(new BibEntryTypesManager())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try to inject bibentrytypesmanager via constructor. If you are unable to, use the injector (Injector.instantiateModelOrService(BibEntryTypesManager.class))
| } | ||
|
|
||
| Optional<BibEntry> other = fetcher.performSearchById(doi.get().asString()); | ||
| return other.map(o -> compareReferences(entry, o)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid abbreviations
| public static final class Fake extends ReferenceValidity { | ||
| public boolean equals(Object o) { | ||
| return o.getClass() == Fake.class; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Note: Don't take this suggestion right away, let another maintainer confirm)
Creating a new Fake every time seems a bit odd as unlike Real, it is a stateless marker. This should be a singleton.
Maybe do something like
public static final Fake INSTANCE = new Fake();
private Fake() {}
@Override
public boolean equals(Object o) {
return o instanceof Fake;
}and for the hashcode return an arbitrary constant.
This way, you can just do something like return Fake.INSTANCE instead of return new Fake().
@calixtus do you have a better idea? Something feels odd about the design.
| Set<BibEntry> matchingReferences = new HashSet<>(); | ||
| matchingReferences.add(matchingReference); | ||
| this.matchingReferences = matchingReferences; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Set<BibEntry> matchingReferences = new HashSet<>(); | |
| matchingReferences.add(matchingReference); | |
| this.matchingReferences = matchingReferences; | |
| this.matchingReferences = new HashSet<>(Set.of(matchingReference)); |
| return this; | ||
| } | ||
| if (other instanceof Unsure otherUnsure && this instanceof Unsure thisUnsure) { | ||
| otherUnsure.addAll(thisUnsure); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An or method, in principle, should return a new result and not mutate its inputs
| @Test | ||
| void validateListOfEntriesTest() throws FetcherException { | ||
| List<BibEntry> entries = List.of(realEntry, realEntryNoDoi, fakeEntry); | ||
| var e = refChecker.validateListOfEntries(entries); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove abbreviations and var usages from this file.
(usually we don't fret so much about tests but in case we ever need to debug them, readability always helps)
| @Test | ||
| void closeToRealEntry() throws FetcherException { | ||
| RefChecker.ReferenceValidity rv = refChecker.referenceValidityOfEntry(closeToRealEntry); | ||
| System.out.println(((RefChecker.Unsure) rv).matchingReferences); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove
… modifying the old one, small code quality changes
|
Should I ignore internal fields for degreeOfSimilarity, they are not fields that directly relate to the reference/bib so maybe they should be ignored? That makes sense at least from the RefChecker standpoint, but I could just make another method, that ignores internal fields and another that doesn't. |
|
This is a difficult question. Use case 1: Papers of a PDF and a fetcher: These fields probably won't appear. Use case 2: Paper by fetcher is added to library: JabRef fields can be ignored. Use case 3: Paper of library A is copied to library B: As user, I want the two entries being identified as similar even if "my" grouping, attached files, etc are different. Result: Yes, can be ignored. Has to be documented at JavaDoc. |
|
They can be ignored, a fetcher will never return a special field value, they are JabRef specific. |
| @Test | ||
| void degreeOfSimilarityOfSameEntryIsOne() { | ||
| assertEquals(1.0, duplicateChecker.degreeOfSimilarity(getSimpleArticle(), getSimpleArticle())); | ||
| assertEquals(1.0, duplicateChecker.degreeOfSimilarity(getSimpleInCollection(), getSimpleInCollection())); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please only a single assertion per method - see https://devdocs.jabref.org/code-howtos/testing.html
I think, you can rewrite using @ParamterizedTest.
I think, this can also be merged entriesWithNoMatchingFieldHaveNoSimilarity and more.
|
|
||
| public static abstract sealed class ReferenceValidity permits Real, Unsure, Fake { | ||
|
|
||
| public ReferenceValidity or(ReferenceValidity other) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add JavaDoc (///) for this.
| assertEquals(t1, t1.or(t2)); | ||
| assertEquals(t1, t1.or(t3)); | ||
| assertEquals(t2, t3.or(t2)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get this... t1 is always equal to t1. Why is t1.or(t2) wirtten down there? 😅
| } | ||
|
|
||
| @Test | ||
| void findsRealEntry() throws FetcherException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, all tests can be rewritten to @ParamterizedTest
| StandardField.PDF, | ||
| StandardField.PS, | ||
| StandardField.URL, | ||
| StandardField.DOI, | ||
| StandardField.FILE, | ||
| StandardField.ISBN, | ||
| StandardField.ISSN))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why we need all of them - PDF and PS are outdated.
| * If the result is one, it means that there was at least one common field | ||
| * and all the common fields were the same. | ||
| * <p> | ||
| * Similar entries have a score of above 0.8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0.8 should be constant in DuplicateCheck / linked from there.
| DoiFetcher doiFetcher; | ||
| ArXivFetcher arxivFetcher; | ||
| CrossRef crossRef; | ||
| DuplicateCheck duplicateCheck; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should be private?
| /** | ||
| * Tries to find the best reference validity | ||
| * among current ways. If any of the methods signal | ||
| * that it is real, it returns early. | ||
| * <p> | ||
| * DoiFetcher -> CrossRef -> ArxivFetcher | ||
| * | ||
| * @param entry entry checking | ||
| * @return the reference validity | ||
| * @throws FetcherException any error from fetchers | ||
| */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try to use Markdown as JavaDoc format. Otherwise you need to escape >.
| public int hashCode() { | ||
| return Objects.hashCode(Fake.class); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No neeed for an implementation, JDK's default works well in most cases
Override is only necessary if equals is overridden: https://thefinestartist.com/effective-java/09
| ReferenceValidity get() throws FetcherException; | ||
| } | ||
|
|
||
| public static abstract sealed class ReferenceValidity permits Real, Unsure, Fake { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe
record(Validity validtiy, @Nullable BibTeX otherEntry)
And Valildity being an enum - and otherEntry allowed for null in the fake entry.
I am not sure if there could be a group of other entries in some cases; but this design would be more future proof and more readable. @subhramit
Refs #13604
This PR includes all of the logic components of the RefCheck functionality.
With this we can implement refcheck in the CLI and GUI of jabref.
The added class, RefChecker, searches through different Fetchers to find matching or similar entries and returns either Fake, Real, or Unsure with related found entries.
To achieve this a new method in the DuplicateCheck class has been made, that finds the similarity of different Entries and returns a double between 0 and 1 depending on how similar the entries are.
Steps to test
Read the Unit Tests and run them to see how the classes are meant to work.
Mandatory checks
CHANGELOG.mdin a way that is understandable for the average user (if change is visible to the user)