-
-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Add support for automatic ICORE conference ranking lookup [#13476] #13699
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
// both strings are zero length | ||
if (longerLength == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment is redundant as it simply restates what the code clearly shows. The comment doesn't provide additional information or reasoning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment was added there by the original author of the code, I've merely ported it with the necessary modifications. That said, I will contest this review as the lines
final int longerLength = longer.length();
// both strings are zero length
if (longerLength == 0) {
return 1.0;
}
do not explicitly state, on their own, that the two input strings are equal. Further, the comment is present in the original Stack Overflow post here. That being said, if you're adamant about it, I don't mind changing this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Take the bot meanings with care, they are not always that good
double exactMatch = 1.0; | ||
double similarity = similarityChecker.similarity(a, b); | ||
|
||
assertTrue(similarity >= EPSILON_SIMILARITY && similarity < exactMatch); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using assertTrue with a boolean condition instead of asserting the actual contents. Should compare the actual similarity value with expected bounds using assertEquals.
For the last few days, I've just been browsing the code, reading the docs, and interacting with the application on my local machine. Since this is the first time I'm interacting with the JabRef ecosystem, and ICORE by extension, I have some questions regarding the app itself and the feature's use-case. I'll post each one as a separate comment. Apologies if some of these are too obvious. |
The issue post mentions: "When a BibTeX entry includes a conference title". What does "conference title" here refer to? A. Following the Getting Started guide on the app, when you add a new entry via I'm guessing the answer is any entry regardless of type, but I still have to ask to be sure. B. If I'm querying all entries regardless of type, do I have to search only the Title field? There are entries where the conference name isn't in the title field. Case in point: I used the Web Search feature in the app to lookup the conference mentioned in the issue post: "ACIS Conference on Software Engineering Research, Management and Applications". I selected and imported the following entry: https://ieeexplore.ieee.org/document/9509045. It gets imported as an I'm assuming that I should be looking for the conference title and its acronym in all of the fields. Is there some sort of a standard way here regarding how entries are imported into JabRef so that I only have to look for the title inside a subset of fields rather than all of them? |
The ICORE ranking data and its presentation. A. Since each BibTeX entry is annotated with a year, I'm assuming that the user wants to see the ICORE ranking for that particular year. Again, very obvious, but I want to confirm. This would also mean that if a conference was added to ICORE later on, any entries from previous years should have a "Not Available" or "N/A" in the ICORE rank field. Or do we use the mismatched ranking as a fallback? B. Since ICORE rankings are released roughly every 2-3 years, what about some weird edge cases within that time period? Say a conference was there for 2014, then it wasn't in the 2017 list, but then it was added again in 2018. Do we worry about entries for 2017 and give them an "N/A" rank? What if a conference's rank changed during that time period? Of course, if the answer to the previous question was that we do not use a fallback and stick to the exact date, then none of these questions matter. C. The exported ICORE ranking data provided from the website (https://portal.core.edu.au/conf-ranks/) does not contain a header row in the CSV file. This isn't a big deal as the important bits can be made out quite clearly (all except one, that is). Each line contains 9 columns: ID, Title, Acronym, Source, Rank, UNKNOWN, FoR-1, FoR-2, FoR-3. The UNKNOWN field in column 6 is a "Yes" or "No" for every line. It is always present, but I can't seem to figure what it corresponds to (it isn't the Note or DBLP column from the website). Consequently, I cannot determine whether it is important. If you know what it is or if it is relevant to the feature, please let me know. |
@koppor can you please help answer the questions I've posted above? |
Oh, did you ever read about bibtex? - it is |
|
Always use the latest ranking. We are not interested in historic data. - Only one CSV should be used. |
Web site has: ![]() Which is
Example CSV line
(NOTE: It would be good if this was included in your question to make it self-contained) I cannot quickly see it, but we need "Title", "Acronym" and "Rank" only. The other columns can be ommitted, can't they? |
Please make your question numbers unique. "A" is used double, isn't it?
No, always the latest year. |
Always use the latest CSV - there is one export. This CSV should be used. |
For |
I hope, I got all questions, I am a bit confused since the questions are all labeled with "A" and I could have missed something. |
@TheYorouzoya I wonder if you have seen the "Helpful resources" section at the issue description (#13476) ![]() It links to #13512 Did you know that one can click on "Files changed"? ![]() You are routed to https://github.com/JabRef/jabref/pull/13512/files You then might have seen ![]() I know that code reading is not easy; but it is an essential skill to produce maintainable code. |
Starting from the issue post ![]() I wanted to see for myself what the feature might look like inside of JabRef to the user. So I downloaded the current version and searched for the conference mentioned in the issue post. I imported one of the entries and saw that the conference name showed up inside the "Journal" field ![]() Even though the manual says that the optional field for a venue exists ![]() So I, then, booted up the build on my local repo, i.e., the ![]() even though there is a clearly indicated "Venue" field which is empty ![]() Do you see why I would ask such an obvious question after this? |
@TheYorouzoya Thank you for your patience. It's all voluntary work here. It needs time to explain the domain of scientific references. Maybe you can be a guest a little longer here and improve our documentation at https://docs.jabref.org. Currently we see guests being here just a short time, doing a task, and then leave. I always hope that a guest will make the place better as a whole; especially because all guests seem to be learning software engineering and not just programming. |
Data sourced from ICORE website here: https://portal.core.edu.au/conf-ranks/ to enable ICORE rank lookups. As discussed here: JabRef#13699 (comment), only the latest data from ICORE is to be used. At this time, it is the ICORE2023 ranking data. Part of JabRef#13476
Hey @TheYorouzoya - That is not needed. You can just use the labels to bundle questions under their respective contexts like you do, just add numbering to them (like A1, A2, etc.) so that they can be specifically and easily referred to when answering. |
I am more used to Gitter (Matrix) chat for a bulk of questions 😅. Sorry for that! I have to confess that I did not check the terms properly while writing. I used "venue" as a scientist indicating a conference. And I did not check whether BibLaTeX has some "definition" of venue. A "venue" meant in the issue is some I also meant journal articles, which are defined by I hope, I could answer your question now and you are unblocked to move forward. |
- Append a header row to resources/icore/ICORE2023.csv - Add ConferenceEntry record to represent ICORE conference data - Add ConferenceRepository to load conference data and allow conference lookups using an acronym or a bookTitle with fuzzy match as a fallback - Add utility class to extract an acronym from a bookTitle - Add tests Part of JabRef#13476
// A slight modification of: https://stackoverflow.com/a/17759264 | ||
private static final Pattern PATTERN = Pattern.compile("\\(([^()]*)\\)"); | ||
|
||
public static Optional<String> extract(String input) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Method lacks input validation for null parameter which could lead to NullPointerException. While Optional return is good, the input should be validated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NonNull
jspecify annotation is OK
Please add JavaDoc.
public Optional<ConferenceEntry> getConferenceFromBookTitle(String bookTitle) { | ||
String query = bookTitle.strip().toLowerCase(); | ||
|
||
// Lucky path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment does not add new information and can be plainly derived from the code. It should be removed as it doesn't provide additional context or reasoning.
String acronym, | ||
String rank | ||
) { | ||
private final static String URL_PREFIX = "https://portal.core.edu.au/conf-ranks/"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect order of modifiers. According to Java conventions and effective Java principles, it should be 'private static final' instead of 'private final static'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will fix that in the next commit.
} | ||
|
||
@Test | ||
void extractReturnsEmptyforEmptyParentheses() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Method name contains a typo: 'forEmptyParentheses' should be 'ForEmptyParentheses' to maintain consistent camelCase naming convention in test methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will fix that in the next commit.
|
||
@Test | ||
void getConferenceFromBookTitleReturnsConferenceForFuzzyMatchAboveThreshold() { | ||
// String similarity > 0.9 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment merely states what can be derived from the code and test name, not providing additional information about reasoning or implementation details.
Thank you! I'll work on the GUI side next. I do have some questions there, but I'll post those once I'm done looking around the code a bit more. |
Also, here is a link to our gitter chat. |
// Lucky path | ||
ConferenceEntry conference = titleToConference.get(query); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment 'Lucky path' doesn't add any new information and can be derived from the code itself. Comments should provide additional context or reasoning.
} | ||
|
||
final int longerLength = longer.length(); | ||
// both strings are zero length |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is trivial and can be derived directly from the code condition (longerLength == 0). It should be removed as it doesn't add new information.
/** | ||
* A Conference Entry built from a subset of fields in the ICORE Ranking data | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment merely restates what is obvious from the code and doesn't provide additional information about the purpose or constraints of the record.
@koppor please take a look at the progress so far and review. |
*/ | ||
public static List<Field> getDefaultGeneralFields() { | ||
List<Field> defaultGeneralFields = new ArrayList<>(Arrays.asList(StandardField.DOI, StandardField.CITATIONCOUNT, StandardField.CROSSREF, StandardField.KEYWORDS, StandardField.EPRINT, StandardField.URL, StandardField.FILE, StandardField.GROUPS, StandardField.OWNER, StandardField.TIMESTAMP)); | ||
List<Field> defaultGeneralFields = new ArrayList<>(List.of(StandardField.DOI, StandardField.ICORERANKING, StandardField.CITATIONCOUNT, StandardField.CROSSREF, StandardField.KEYWORDS, StandardField.EPRINT, StandardField.URL, StandardField.FILE, StandardField.GROUPS, StandardField.OWNER, StandardField.TIMESTAMP)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using ArrayList with List.of() is inefficient. Since the list is immediately populated, using Set.of() or List.of() directly would be more appropriate and aligned with modern Java practices.
CITATIONCOUNT("citationcount"), | ||
TIMESTAMP("timestamp", FieldProperty.DATE), | ||
|
||
// Timestamp-realted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment contains a spelling error in the word 'realted' which should be 'related'. Variable and comment spelling accuracy is important for code maintainability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good beginning!
Minor comments inside.
Please disalbe the button "openExternalLink" if there is no matched conference

undo is broken. I had a rank "B", clicked on lookup - replaced by "not found". Library modified, but I cannot undo.

Propossal: If there is a value in ICORE and lookup would replace it by "not found", do
dialogService.notify(Localization.long("not found"))
(or similar)
INSTEAD of replacing it.
* Calculates the similarity (a number within 0 and 1) between two strings. | ||
* http://stackoverflow.com/questions/955110/similarity-string-comparison-in-java | ||
*/ | ||
private static double similarity(final String first, final String second) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice find :)
SpecialField.READ_STATUS, SpecialField.RELEVANCE | ||
); | ||
|
||
if (!currentGeneralPrefs.equals(expectedGeneralPrefs)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice check!
Map<String, Set<Field>> entryEditorPrefs = preferences.getEntryEditorPreferences().getEntryEditorTabs(); | ||
Set<Field> currentGeneralPrefs = entryEditorPrefs.get("General"); | ||
|
||
Set<Field> expectedGeneralPrefs = Set.of( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think org.jabref.model.entry.field.FieldFactory#getDefaultGeneralFields
should be linked - to enable others to lookup things if they implement something similar.
@@ -558,6 +561,37 @@ static void moveApiKeysToKeyring(JabRefCliPreferences preferences) { | |||
} | |||
} | |||
|
|||
static void addICORERankingFieldToGeneralTab(GuiPreferences preferences) { | |||
Map<String, Set<Field>> entryEditorPrefs = preferences.getEntryEditorPreferences().getEntryEditorTabs(); | |||
Set<Field> currentGeneralPrefs = entryEditorPrefs.get("General"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, Localization.lang("General")
should be used - see org.jabref.logic.preferences.JabRefCliPreferences#setLanguageDependentDefaultValues
} | ||
|
||
entryEditorPrefs.put( | ||
"General", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also org.jabref.logic.preferences.JabRefCliPreferences#setLanguageDependentDefaultValues
import static org.junit.jupiter.api.Assertions.assertEquals; | ||
import static org.junit.jupiter.api.Assertions.assertTrue; | ||
|
||
public class ConferenceAcronymExtractorTest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please rewrite to a @ParamterizedTest
and @CsvCource
. Use modern Optional testing - see comments on other test
@FXML private Button visitICOREConferencePageButton; | ||
|
||
@Inject private DialogService dialogService; | ||
@Inject private TaskExecutor taskExecutor; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused variable - remove - also at constructor
@Inject private DialogService dialogService; | ||
@Inject private TaskExecutor taskExecutor; | ||
@Inject private GuiPreferences preferences; | ||
@Inject private UndoManager undoManager; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused variable- remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UndoManager
is required by the constructor inside theAbstractEditorViewModel
superclass.GuiPreferences
is required to open the conference page via the call toNativeDesktop.openBrowser()
since it needs the user'sexternalApplicationPreferences
.DialogService
will be required to apply the fix suggested here.
I'll remove the TaskExecutor
since it is not being used.
// A slight modification of: https://stackoverflow.com/a/17759264 | ||
private static final Pattern PATTERN = Pattern.compile("\\(([^()]*)\\)"); | ||
|
||
public static Optional<String> extract(String input) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NonNull
jspecify annotation is OK
Please add JavaDoc.
CREATIONDATE("creationdate", FieldProperty.DATE), | ||
MODIFICATIONDATE("modificationdate", FieldProperty.DATE); | ||
GROUPS("groups"), | ||
ICORERANKING("icoreranking"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be named icore
to be more short - WDYT?
(We also don't have ownername
or groupname
, because the prefix is unique enough)
I did it at b092aa9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am completely new to this domain, so I don't really have an input. My decision was based purely on the task in the issue post:
Create a new field: ICORANKING ('icoranking') and add it to org.jabref.model.entry.field.StandardField at the // JabRef-specific fields section.
The misspelling was later corrected in this comment here.
I'll update it with your suggestion.
Since 3 of 3 attemts failed to extract a title, here my test data set: test-cases.zip Maybe, it can be used to improve the matching; maybe not (future work ^^) |
JUnit tests of You can then run these tests in IntelliJ to reproduce the failing tests locally. We offer a quick test running howto in the section Final build system checks in our setup guide. |
Data sourced from ICORE website here: https://portal.core.edu.au/conf-ranks/ to enable ICORE rank lookups. As discussed here: JabRef#13699 (comment), only the latest data from ICORE is to be used. At this time, it is the ICORE2023 ranking data. Part of JabRef#13476
- Append a header row to resources/icore/ICORE2023.csv - Add ConferenceEntry record to represent ICORE conference data - Add ConferenceRepository to load conference data and allow conference lookups using an acronym or a bookTitle with fuzzy match as a fallback - Add utility class to extract an acronym from a bookTitle - Add tests Part of JabRef#13476
- Out of the 4 missing keys in a failing test, two new keys were added to JabRef_en.properties while two were edited adapted to already present ones. - The performExportForSingleEntry test in OpenOfficeDocumentCreatorTest was failing because of the missing "Icoreranking" field. Further, since the ordering of JabRef-specific fields was changed in a previous commit to conform to alphabetical order, the hardcoded values in OldOpenOfficeCalcExportFormatContentSingleEntry.xml weren't matching with the exporter's output. This commit reorders the fields in the expected order and adds the "Icoreranking" field at its right place which fixes the broken test. Part of JabRef#13476
- Update ICORERankingEditorViewModel to display a notification when a conference ranking isn't found instead of displaying the "not found" text in the field. - Refactored PreferencesMigrations.addICORERankingFieldToGeneralTab to better align with JabRefCliPreferences.setLanguageDependentDefaultValues. - Minor refactor and remove redundant comment and whitespace in ConferenceRepository. - Update tests to Parameterized tests in ConferenceRepositoryTest and ConferenceAcronymExtractorTest. - Removed unused variables in ICORERankingEditor and ICORERankingEditorViewModel. - Update OldOpenOfficeCalcExportFormatContentSingleEntry.xml to align with the field ordering in StandardField. - Add documentation to PreferencesMigrations.addICORERankingFieldToGeneralTab and ConferenceAcronymExtractor.extract methods. Part of JabRef#13476
This reverts commit 75d2b47 which I accidentally pulled on my feature branch.
@trag-bot didn't find any issues in the code! ✅✨ |
Note that your PR will not be reviewed/accepted until you have gone through the mandatory checks in the description and marked each of them them exactly in the format of |
I left that test ambiguous because I was expecting the internal details of the class to change in the future. Currently, clients have to instantiate an object to use the So why not make this a "pure" utility class and expose only static methods? This also brings me to address your other comment:
A lot of those tests fail because Levenshtein Distance is a rather poor candidate for finding and matching conference titles, especially with such a high threshold of 0.9 (which was prescribed in the original issue). I will group the provided tests into categories so I can address them collectively: Group A: Input where an acronym is present in parentheses along with other text.Examples,
The current implementation can be adapted to address these cases by adding another step into the acronym matching process where the extracted string is first split on whitespace and then on special characters with acronym lookups for each step. The two-step process is necessary to account for outliers like Group B: Input where Levenshtein similarity matching is guaranteed to failExamples,
Cases where a conference title is present with extra text that is closer to the title's length or where the title is jumbled up will yield a very poor similarity rating. The edit distance for Levenshtein is simply too large in these cases. While it may seem, at first, that we can simply get a bunch of substrings by splitting on commas Group C: Input where the data is outdated/mismatchedExamples,
There's also quite a number of entries in the original ICORE data where information about a title/acronym change is included. A few examples:
Now, we can try to extract the previous title or acronym from them but, as you can see, there's no consistent pattern here. Sometimes, it indicates the previous title, sometimes it is just the acronym, the words used change frequently ('was', 'previously', 'changed in', 'merger of'), or both the acronym and title are included, and so on. Proposed Solution
I'll start working on implementing some of the more immediate changes. If you have any input on this, then I'd appreciate it. |
Thank you for the comments and ideas. Especially, the grouping is nice. It reminds me of the issue #12728. Naively if an abbreviator of booktitles is implemented, one could run this on both the Maybe, we should split work here to keep focused. Leave the current handling as is and weave-in an abbreviator later? To have some separation of concerns? :) Side track - the discussion somehow refs following issues
|
Closes #13476: Add support for automatic ICORE conference ranking lookup
This PR adds the required feature to enable ICORE conference ranking lookups whenever a BibTeX entry includes a conference title.
Task list mentioned in the original issue:
Steps to test
Icoreranking
field shows up in the General Tab under theDOI
field.InProceedings
and enter a conference acronym (in parentheses) in theBooktitle
field. Then, navigate to the General Tab again and click the lookup rank button to see the ICORE rank for the conference.Clicking the Open Conference Page button will open your default browser and take you to the ICORE conference page for the conference (for SIGCOMM in the screenshot, it would be here.
In case an acronym isn't present in the title, the tool will then try to lookup the entire
Booktitle
in the ranking data, with a fuzzy match fallback of 90% similarity.InProceedings
,InCollection
, andArticle
entry types and looks for conference titles inBooktitle
,Journaltitle
, orTitle
fields.Some caveats:
ConferenceAcronymExtractor
works (see related tests for details). Some examples to illustrate this:(This doesn't get pulled (this does))
->this does
(First) acronym is pulled, not the (second) one
. ->First
(This doesn't (I DO)) and (this won't (either))
. ->I DO
"ACM Conference on Economics and Computation (EC)"
with the acronym removed does not match"ACM Conference on Economics and Computation"
since the similarity rating is 0.896 (<0.9).Mandatory checks
CHANGELOG.md
described in a way that is understandable for the average user (if change is visible to the user)