Add support for automatic ICORE conference ranking lookup [#13476] #13699

TheYorouzoya · 2025-08-15T09:33:46Z

Closes #13476: Add support for automatic ICORE conference ranking lookup

This PR adds the required feature to enable ICORE conference ranking lookups whenever a BibTeX entry includes a conference title.

Task list mentioned in the original issue:

Move DuplicateCheck#similarity from org.jabref.logic.database to a new utility class: org.jabref.logic.util.strings.StringSimilarity.
Add ICORE.csv to src/main/resources/
Create a class that loads and indexes the ICORE data at instantiation.
Implement the logic to detect and return the ranking from a full conference name.
Create a new field: ICORANKING ('icoranking') and add it to org.jabref.model.entry.field.StandardField at the // JabRef-specific fields section.
Create a new field editor ICoreRankingEditor (inspired by IdentifierEditor) and integrate it into the UI by modifying FieldEditors#getForField(...). - The lookup button (like DOI lookup) to lookup the ranking in the CSV file
Write unit tests for acronym matching and similarity fallback.

Steps to test

By default, the Icoreranking field shows up in the General Tab under the DOI field.

Add a New Entry of type InProceedings and enter a conference acronym (in parentheses) in the Booktitle field. Then, navigate to the General Tab again and click the lookup rank button to see the ICORE rank for the conference.

Clicking the Open Conference Page button will open your default browser and take you to the ICORE conference page for the conference (for SIGCOMM in the screenshot, it would be here.
In case an acronym isn't present in the title, the tool will then try to lookup the entire Booktitle in the ranking data, with a fuzzy match fallback of 90% similarity.

The feature allows lookups for InProceedings, InCollection, and Article entry types and looks for conference titles in Booktitle, Journaltitle, or Title fields.

In case an acronym is present but it doesn't match anything, the feature will still fallback to searching for the entire title string. If a match is not found for the full title either, a notification with "not found" will be displayed and the Open Conference Page Button will be disabled.

Some caveats:

The feature will always look for the acronym in the first, deepest set of parentheses it encounters. This is a direct consequence of how the regex in the ConferenceAcronymExtractor works (see related tests for details). Some examples to illustrate this:
- (This doesn't get pulled (this does)) -> this does
- (First) acronym is pulled, not the (second) one. -> First
- (This doesn't (I DO)) and (this won't (either)). -> I DO
The fuzzy match fallback uses the Levenshtein Distance as its similarity metric with a threshold of 0.9. This can lead to some "unexpected" cases. For example, "ACM Conference on Economics and Computation (EC)" with the acronym removed does not match "ACM Conference on Economics and Computation" since the similarity rating is 0.896 (<0.9).

Mandatory checks

I own the copyright of the code submitted and I license it under the MIT license
Change in CHANGELOG.md described in a way that is understandable for the average user (if change is visible to the user)
Tests created for changes (if applicable)
Manually tested changed features in running JabRef (always required)
Screenshots added in PR description (if change is visible to the user)
TODO Checked documentation: Is the information available and up to date? If not, I created an issue at https://github.com/JabRef/user-documentation/issues or, even better, I submitted a pull request to the documentation repository.
[/] Checked developer's documentation: Is the information available and up to date? If not, I outlined it in this pull request.

trag-bot · 2025-08-15T09:34:55Z

jablib/src/main/java/org/jabref/logic/util/strings/StringSimilarity.java

+        // both strings are zero length
+        if (longerLength == 0) {


Comment is redundant as it simply restates what the code clearly shows. The comment doesn't provide additional information or reasoning.

The comment was added there by the original author of the code, I've merely ported it with the necessary modifications. That said, I will contest this review as the lines

final int longerLength = longer.length(); // both strings are zero length if (longerLength == 0) { return 1.0; }

do not explicitly state, on their own, that the two input strings are equal. Further, the comment is present in the original Stack Overflow post here. That being said, if you're adamant about it, I don't mind changing this.

Take the bot meanings with care, they are not always that good

trag-bot · 2025-08-15T09:35:33Z

jablib/src/test/java/org/jabref/logic/util/strings/StringSimilarityTest.java

+        double exactMatch = 1.0;
+        double similarity = similarityChecker.similarity(a, b);
+
+        assertTrue(similarity >= EPSILON_SIMILARITY && similarity < exactMatch);


Using assertTrue with a boolean condition instead of asserting the actual contents. Should compare the actual similarity value with expected bounds using assertEquals.

TheYorouzoya · 2025-08-19T10:35:15Z

For the last few days, I've just been browsing the code, reading the docs, and interacting with the application on my local machine. Since this is the first time I'm interacting with the JabRef ecosystem, and ICORE by extension, I have some questions regarding the app itself and the feature's use-case. I'll post each one as a separate comment. Apologies if some of these are too obvious.

TheYorouzoya · 2025-08-19T10:40:12Z

The issue post mentions: "When a BibTeX entry includes a conference title". What does "conference title" here refer to?

A. Following the Getting Started guide on the app, when you add a new entry via Library->New Entry, the "Select entry type" dialogue box asks for an entry type. One of them is the "Conference" type. Hovering over it says that it is a legacy alias for "InProceedings". Both of these have a "Title" required field. My question is if the feature's exclusively for these types of BibTeX entries or for ALL entries regardless of their type.

I'm guessing the answer is any entry regardless of type, but I still have to ask to be sure.

B. If I'm querying all entries regardless of type, do I have to search only the Title field? There are entries where the conference name isn't in the title field. Case in point: I used the Web Search feature in the app to lookup the conference mentioned in the issue post: "ACIS Conference on Software Engineering Research, Management and Applications". I selected and imported the following entry: https://ieeexplore.ieee.org/document/9509045. It gets imported as an InProceedings, but the conference name is not in the Title field but in the Journal field under the Other Fields tab.

I'm assuming that I should be looking for the conference title and its acronym in all of the fields. Is there some sort of a standard way here regarding how entries are imported into JabRef so that I only have to look for the title inside a subset of fields rather than all of them?

TheYorouzoya · 2025-08-19T10:43:12Z

The ICORE ranking data and its presentation.

A. Since each BibTeX entry is annotated with a year, I'm assuming that the user wants to see the ICORE ranking for that particular year. Again, very obvious, but I want to confirm. This would also mean that if a conference was added to ICORE later on, any entries from previous years should have a "Not Available" or "N/A" in the ICORE rank field. Or do we use the mismatched ranking as a fallback?

B. Since ICORE rankings are released roughly every 2-3 years, what about some weird edge cases within that time period? Say a conference was there for 2014, then it wasn't in the 2017 list, but then it was added again in 2018. Do we worry about entries for 2017 and give them an "N/A" rank? What if a conference's rank changed during that time period? Of course, if the answer to the previous question was that we do not use a fallback and stick to the exact date, then none of these questions matter.

C. The exported ICORE ranking data provided from the website (https://portal.core.edu.au/conf-ranks/) does not contain a header row in the CSV file. This isn't a big deal as the important bits can be made out quite clearly (all except one, that is). Each line contains 9 columns: ID, Title, Acronym, Source, Rank, UNKNOWN, FoR-1, FoR-2, FoR-3. The UNKNOWN field in column 6 is a "Yes" or "No" for every line. It is always present, but I can't seem to figure what it corresponds to (it isn't the Note or DBLP column from the website). Consequently, I cannot determine whether it is important. If you know what it is or if it is relevant to the feature, please let me know.

TheYorouzoya · 2025-08-19T10:49:52Z

@koppor can you please help answer the questions I've posted above?

koppor · 2025-08-19T14:47:11Z

The issue post mentions: "When a BibTeX entry includes a conference title". What does "conference title" here refer to?

Oh, did you ever read about bibtex? - it is @InProceedings plus @InCollection and booktitle.

koppor · 2025-08-19T14:48:43Z

A. Following the Getting Started guide on the app, when you add a new entry via Library->New Entry, the "Select entry type" dialogue box asks for an entry type. One of them is the "Conference" type. Hovering over it says that it is a legacy alias for "InProceedings". Both of these have a "Title" required field. My question is if the feature's exclusively for these types of BibTeX entries or for ALL entries regardless of their type.

I'm guessing the answer is any entry regardless of type, but I still have to ask to be sure.

title is context dependend. We refer to booktitle. You can read on at https://ctan.org/pkg/biblatex for more information on bibtex if you want.

koppor · 2025-08-19T14:49:10Z

B. Since ICORE rankings are released roughly every 2-3 years, what about some weird edge cases within that time period? Say a conference was there for 2014, then it wasn't in the 2017 list, but then it was added again in 2018. Do we worry about entries for 2017 and give them an "N/A" rank? What if a conference's rank changed during that time period? Of course, if the answer to the previous question was that we do not use a fallback and stick to the exact date, then none of these questions matter.

Always use the latest ranking. We are not interested in historic data. - Only one CSV should be used.

koppor · 2025-08-19T14:53:36Z

C. The exported ICORE ranking data provided from the website (portal.core.edu.au/conf-ranks) does not contain a header row in the CSV file. This isn't a big deal as the important bits can be made out quite clearly (all except one, that is). Each line contains 9 columns: ID, Title, Acronym, Source, Rank, UNKNOWN, FoR-1, FoR-2, FoR-3. The UNKNOWN field in column 6 is a "Yes" or "No" for every line. It is always present, but I can't seem to figure what it corresponds to (it isn't the Note or DBLP column from the website). Consequently, I cannot determine whether it is important. If you know what it is or if it is relevant to the feature, please let me know.

Web site has:

Which is

Title
Acronym
Source
Rank
Note
DBLP
Primary FoR
Comments
Average Rating

Example CSV line

9,"ACIS Conference on Software Engineering Research, Management and Applications",SERA,CORE2023,C,No,4612,,
1825,ACM International Joint Conference on Pervasive and Ubiquitous Computing (PERVASIVE and UbiComp combined from 2013),UbiComp,CORE2023,journal published,No,4608,,

(NOTE: It would be good if this was included in your question to make it self-contained)

I cannot quickly see it, but we need "Title", "Acronym" and "Rank" only. The other columns can be ommitted, can't they?

koppor · 2025-08-19T14:54:29Z

Please make your question numbers unique. "A" is used double, isn't it?

A. Since each BibTeX entry is annotated with a year, I'm assuming that the user wants to see the ICORE ranking for that particular year. Again, very obvious, but I want to confirm. This would also mean that if a conference was added to ICORE later on, any entries from previous years should have a "Not Available" or "N/A" in the ICORE rank field. Or do we use the mismatched ranking as a fallback?

No, always the latest year.

koppor · 2025-08-19T14:54:49Z

B. Since ICORE rankings are released roughly every 2-3 years, what about some weird edge cases within that time period? Say a conference was there for 2014, then it wasn't in the 2017 list, but then it was added again in 2018. Do we worry about entries for 2017 and give them an "N/A" rank? What if a conference's rank changed during that time period? Of course, if the answer to the previous question was that we do not use a fallback and stick to the exact date, then none of these questions matter.

Always use the latest CSV - there is one export. This CSV should be used.

koppor · 2025-08-19T14:55:51Z

B. If I'm querying all entries regardless of type, do I have to search only the Title field? There are entries where the conference name isn't in the title field. Case in point: I used the Web Search feature in the app to lookup the conference mentioned in the issue post: "ACIS Conference on Software Engineering Research, Management and Applications". I selected and imported the following entry: ieeexplore.ieee.org/document/9509045. It gets imported as an InProceedings, but the conference name is not in the Title field but in the Journal field under the Other Fields tab.

For @Article use getFieldOrAlias( StandardField.Title) , which will use JournalTitle (for BibLaTeX) first and then check Title.

koppor · 2025-08-19T14:57:50Z

@koppor can you please help answer the questions I've posted above?

I hope, I got all questions, I am a bit confused since the questions are all labeled with "A" and I could have missed something.

koppor · 2025-08-19T15:24:59Z

@TheYorouzoya I wonder if you have seen the "Helpful resources" section at the issue description (#13476)

It links to #13512

Did you know that one can click on "Files changed"?

You are routed to https://github.com/JabRef/jabref/pull/13512/files

You then might have seen

I know that code reading is not easy; but it is an essential skill to produce maintainable code.

TheYorouzoya · 2025-08-20T14:32:41Z

I hope, I got all questions, I am a bit confused...

Thank you for your patience with answering my questions. I appreciate it.

Please make your question numbers unique...

I bundled my questions for a specific context under a singular comment so that it would be easier to reply to all of them like this

That said, this is my way of doing things, and I'm the guest here. Sorry about the confusion it lead to. Moving forward, I will post one individual question per comment. No labels included.

TheYorouzoya · 2025-08-20T14:56:55Z

Oh, did you ever read about bibtex? - it is @InProceedings plus @InCollection and booktitle.

Starting from the issue post

I wanted to see for myself what the feature might look like inside of JabRef to the user. So I downloaded the current version and searched for the conference mentioned in the issue post. I imported one of the entries and saw that the conference name showed up inside the "Journal" field

Even though the manual says that the optional field for a venue exists

So I, then, booted up the build on my local repo, i.e., the jabgui:run task. I imported the same article again, but this time, the conference name shows up in the "Booktitle" field

even though there is a clearly indicated "Venue" field which is empty

Do you see why I would ask such an obvious question after this?

koppor · 2025-08-20T15:21:23Z

@TheYorouzoya Thank you for your patience. It's all voluntary work here. It needs time to explain the domain of scientific references. Maybe you can be a guest a little longer here and improve our documentation at https://docs.jabref.org. Currently we see guests being here just a short time, doing a task, and then leave. I always hope that a guest will make the place better as a whole; especially because all guests seem to be learning software engineering and not just programming.

Data sourced from ICORE website here: https://portal.core.edu.au/conf-ranks/ to enable ICORE rank lookups. As discussed here: JabRef#13699 (comment), only the latest data from ICORE is to be used. At this time, it is the ICORE2023 ranking data. Part of JabRef#13476

subhramit · 2025-08-21T16:42:51Z

I bundled my questions for a specific context under a singular comment so that it would be easier to reply to all of them like this

Moving forward, I will post one individual question per comment. No labels included.

Hey @TheYorouzoya - That is not needed. You can just use the labels to bundle questions under their respective contexts like you do, just add numbering to them (like A1, A2, etc.) so that they can be specifically and easily referred to when answering.

koppor · 2025-08-22T21:16:50Z

I am more used to Gitter (Matrix) chat for a bulk of questions 😅. Sorry for that!

I have to confess that I did not check the terms properly while writing. I used "venue" as a scientist indicating a conference. And I did not check whether BibLaTeX has some "definition" of venue. A "venue" meant in the issue is some @InCollection and @InProceedings entry. We identify it by the booktitle.

I also meant journal articles, which are defined by @Article having title or journaltitle. You can receive the value by getFieldOrAlias(StandardField.JOURNAL).

I hope, I could answer your question now and you are unblocked to move forward.

- Append a header row to resources/icore/ICORE2023.csv - Add ConferenceEntry record to represent ICORE conference data - Add ConferenceRepository to load conference data and allow conference lookups using an acronym or a bookTitle with fuzzy match as a fallback - Add utility class to extract an acronym from a bookTitle - Add tests Part of JabRef#13476

trag-bot · 2025-08-23T10:25:24Z

jablib/src/main/java/org/jabref/logic/icore/ConferenceAcronymExtractor.java

+    // A slight modification of: https://stackoverflow.com/a/17759264
+    private static final Pattern PATTERN = Pattern.compile("\\(([^()]*)\\)");
+
+    public static Optional<String> extract(String input) {


Method lacks input validation for null parameter which could lead to NullPointerException. While Optional return is good, the input should be validated.

@NonNull jspecify annotation is OK

Please add JavaDoc.

trag-bot · 2025-08-23T10:26:12Z

jablib/src/main/java/org/jabref/logic/icore/ConferenceRepository.java

+    public Optional<ConferenceEntry> getConferenceFromBookTitle(String bookTitle) {
+        String query = bookTitle.strip().toLowerCase();
+
+        // Lucky path


Comment does not add new information and can be plainly derived from the code. It should be removed as it doesn't provide additional context or reasoning.

trag-bot · 2025-08-23T10:26:53Z

jablib/src/main/java/org/jabref/model/icore/ConferenceEntry.java

+        String acronym,
+        String rank
+) {
+    private final static String URL_PREFIX = "https://portal.core.edu.au/conf-ranks/";


Incorrect order of modifiers. According to Java conventions and effective Java principles, it should be 'private static final' instead of 'private final static'.

Will fix that in the next commit.

trag-bot · 2025-08-23T10:27:29Z

jablib/src/test/java/org/jabref/logic/icore/ConferenceAcronymExtractorTest.java

+    }
+
+    @Test
+    void extractReturnsEmptyforEmptyParentheses() {


Method name contains a typo: 'forEmptyParentheses' should be 'ForEmptyParentheses' to maintain consistent camelCase naming convention in test methods.

Will fix that in the next commit.

trag-bot · 2025-08-23T10:28:07Z

jablib/src/test/java/org/jabref/logic/icore/ConferenceRepositoryTest.java

+
+    @Test
+    void getConferenceFromBookTitleReturnsConferenceForFuzzyMatchAboveThreshold() {
+        // String similarity > 0.9


Comment merely states what can be derived from the code and test name, not providing additional information about reasoning or implementation details.

TheYorouzoya · 2025-08-23T10:41:20Z

I hope, I could answer your question now and you are unblocked to move forward.

Thank you! I'll work on the GUI side next. I do have some questions there, but I'll post those once I'm done looking around the code a bit more.

subhramit · 2025-08-23T10:44:16Z

I am more used to Gitter (Matrix) chat for a bulk of questions 😅.

Also, here is a link to our gitter chat.

trag-bot · 2025-08-27T08:53:54Z

jablib/src/main/java/org/jabref/logic/icore/ConferenceRepository.java

+        // Lucky path
+        ConferenceEntry conference = titleToConference.get(query);


Comment 'Lucky path' doesn't add any new information and can be derived from the code itself. Comments should provide additional context or reasoning.

trag-bot · 2025-08-27T08:54:37Z

jablib/src/main/java/org/jabref/logic/util/strings/StringSimilarity.java

+        }
+
+        final int longerLength = longer.length();
+        // both strings are zero length


This comment is trivial and can be derived directly from the code condition (longerLength == 0). It should be removed as it doesn't add new information.

trag-bot · 2025-08-27T08:55:53Z

jablib/src/main/java/org/jabref/model/icore/ConferenceEntry.java

+/**
+ * A Conference Entry built from a subset of fields in the ICORE Ranking data
+ */


The comment merely restates what is obvious from the code and doesn't provide additional information about the purpose or constraints of the record.

TheYorouzoya · 2025-08-27T15:04:07Z

@koppor please take a look at the progress so far and review.

trag-bot · 2025-08-28T13:23:54Z

jablib/src/main/java/org/jabref/model/entry/field/FieldFactory.java

     */
    public static List<Field> getDefaultGeneralFields() {
-        List<Field> defaultGeneralFields = new ArrayList<>(Arrays.asList(StandardField.DOI, StandardField.CITATIONCOUNT, StandardField.CROSSREF, StandardField.KEYWORDS, StandardField.EPRINT, StandardField.URL, StandardField.FILE, StandardField.GROUPS, StandardField.OWNER, StandardField.TIMESTAMP));
+        List<Field> defaultGeneralFields = new ArrayList<>(List.of(StandardField.DOI, StandardField.ICORERANKING, StandardField.CITATIONCOUNT, StandardField.CROSSREF, StandardField.KEYWORDS, StandardField.EPRINT, StandardField.URL, StandardField.FILE, StandardField.GROUPS, StandardField.OWNER, StandardField.TIMESTAMP));


Using ArrayList with List.of() is inefficient. Since the list is immediately populated, using Set.of() or List.of() directly would be more appropriate and aligned with modern Java practices.

trag-bot · 2025-08-28T13:29:46Z

jablib/src/main/java/org/jabref/model/entry/field/StandardField.java

-    CITATIONCOUNT("citationcount"),
-    TIMESTAMP("timestamp", FieldProperty.DATE),
+
+    // Timestamp-realted


The comment contains a spelling error in the word 'realted' which should be 'related'. Variable and comment spelling accuracy is important for code maintainability.

koppor

Good beginning!

Minor comments inside.

Please disalbe the button "openExternalLink" if there is no matched conference

undo is broken. I had a rank "B", clicked on lookup - replaced by "not found". Library modified, but I cannot undo.

Propossal: If there is a value in ICORE and lookup would replace it by "not found", do

dialogService.notify(Localization.long("not found"))

(or similar)

INSTEAD of replacing it.

koppor · 2025-08-28T12:35:05Z

jablib/src/main/java/org/jabref/logic/database/DuplicateCheck.java

-     * Calculates the similarity (a number within 0 and 1) between two strings.
-     * http://stackoverflow.com/questions/955110/similarity-string-comparison-in-java
-     */
-    private static double similarity(final String first, final String second) {


Nice find :)

koppor · 2025-08-28T12:35:19Z

jabgui/src/main/java/org/jabref/migrations/PreferencesMigrations.java

+            SpecialField.READ_STATUS, SpecialField.RELEVANCE
+        );
+
+        if (!currentGeneralPrefs.equals(expectedGeneralPrefs)) {


Nice check!

koppor · 2025-08-28T12:37:53Z

jabgui/src/main/java/org/jabref/migrations/PreferencesMigrations.java

+        Map<String, Set<Field>> entryEditorPrefs = preferences.getEntryEditorPreferences().getEntryEditorTabs();
+        Set<Field> currentGeneralPrefs = entryEditorPrefs.get("General");
+
+        Set<Field> expectedGeneralPrefs = Set.of(


I think org.jabref.model.entry.field.FieldFactory#getDefaultGeneralFields should be linked - to enable others to lookup things if they implement something similar.

koppor · 2025-08-28T12:39:48Z

jabgui/src/main/java/org/jabref/migrations/PreferencesMigrations.java

@@ -558,6 +561,37 @@ static void moveApiKeysToKeyring(JabRefCliPreferences preferences) {
        }
    }

+    static void addICORERankingFieldToGeneralTab(GuiPreferences preferences) {
+        Map<String, Set<Field>> entryEditorPrefs = preferences.getEntryEditorPreferences().getEntryEditorTabs();
+        Set<Field> currentGeneralPrefs = entryEditorPrefs.get("General");


I think, Localization.lang("General") should be used - see org.jabref.logic.preferences.JabRefCliPreferences#setLanguageDependentDefaultValues

koppor · 2025-08-28T12:39:57Z

jabgui/src/main/java/org/jabref/migrations/PreferencesMigrations.java

+        }
+
+        entryEditorPrefs.put(
+                "General",


Also org.jabref.logic.preferences.JabRefCliPreferences#setLanguageDependentDefaultValues

koppor · 2025-08-28T13:04:44Z

jablib/src/test/java/org/jabref/logic/icore/ConferenceAcronymExtractorTest.java

+import static org.junit.jupiter.api.Assertions.assertEquals;
+import static org.junit.jupiter.api.Assertions.assertTrue;
+
+public class ConferenceAcronymExtractorTest {


Please rewrite to a @ParamterizedTest and @CsvCource. Use modern Optional testing - see comments on other test

koppor · 2025-08-28T13:05:48Z

jabgui/src/main/java/org/jabref/gui/fieldeditors/ICORERankingEditor.java

+    @FXML private Button visitICOREConferencePageButton;
+
+    @Inject private DialogService dialogService;
+    @Inject private TaskExecutor taskExecutor;


Unused variable - remove - also at constructor

koppor · 2025-08-28T13:06:02Z

jabgui/src/main/java/org/jabref/gui/fieldeditors/ICORERankingEditor.java

+    @Inject private DialogService dialogService;
+    @Inject private TaskExecutor taskExecutor;
+    @Inject private GuiPreferences preferences;
+    @Inject private UndoManager undoManager;


Unused variable- remove

UndoManager is required by the constructor inside the AbstractEditorViewModel superclass.

GuiPreferences is required to open the conference page via the call to NativeDesktop.openBrowser() since it needs the user's externalApplicationPreferences.

DialogService will be required to apply the fix suggested here.

I'll remove the TaskExecutor since it is not being used.

koppor · 2025-08-28T13:15:06Z

jablib/src/main/java/org/jabref/logic/icore/ConferenceAcronymExtractor.java

+    // A slight modification of: https://stackoverflow.com/a/17759264
+    private static final Pattern PATTERN = Pattern.compile("\\(([^()]*)\\)");
+
+    public static Optional<String> extract(String input) {


@NonNull jspecify annotation is OK

Please add JavaDoc.

koppor · 2025-08-28T13:26:11Z

jablib/src/main/java/org/jabref/model/entry/field/StandardField.java

    CREATIONDATE("creationdate", FieldProperty.DATE),
-    MODIFICATIONDATE("modificationdate", FieldProperty.DATE);
+    GROUPS("groups"),
+    ICORERANKING("icoreranking"),


Should be named icore to be more short - WDYT?

(We also don't have ownername or groupname, because the prefix is unique enough)

I did it at b092aa9

I am completely new to this domain, so I don't really have an input. My decision was based purely on the task in the issue post:

Create a new field: ICORANKING ('icoranking') and add it to org.jabref.model.entry.field.StandardField at the // JabRef-specific fields section.

The misspelling was later corrected in this comment here.

I'll update it with your suggestion.

koppor · 2025-08-28T14:05:29Z

Since 3 of 3 attemts failed to extract a title, here my test data set: test-cases.zip

Maybe, it can be used to improve the matching; maybe not (future work ^^)

jabref-machine · 2025-08-28T14:14:49Z

JUnit tests of jablib are failing. You can see which checks are failing by locating the box "Some checks were not successful" on the pull request page. To see the test output, locate "Source Code Tests / Unit tests (pull_request)" and click on it.

You can then run these tests in IntelliJ to reproduce the failing tests locally. We offer a quick test running howto in the section Final build system checks in our setup guide.

Data sourced from ICORE website here: https://portal.core.edu.au/conf-ranks/ to enable ICORE rank lookups. As discussed here: JabRef#13699 (comment), only the latest data from ICORE is to be used. At this time, it is the ICORE2023 ranking data. Part of JabRef#13476

- Append a header row to resources/icore/ICORE2023.csv - Add ConferenceEntry record to represent ICORE conference data - Add ConferenceRepository to load conference data and allow conference lookups using an acronym or a bookTitle with fuzzy match as a fallback - Add utility class to extract an acronym from a bookTitle - Add tests Part of JabRef#13476

Part of JabRef#13476

- Out of the 4 missing keys in a failing test, two new keys were added to JabRef_en.properties while two were edited adapted to already present ones. - The performExportForSingleEntry test in OpenOfficeDocumentCreatorTest was failing because of the missing "Icoreranking" field. Further, since the ordering of JabRef-specific fields was changed in a previous commit to conform to alphabetical order, the hardcoded values in OldOpenOfficeCalcExportFormatContentSingleEntry.xml weren't matching with the exporter's output. This commit reorders the fields in the expected order and adds the "Icoreranking" field at its right place which fixes the broken test. Part of JabRef#13476

- Update ICORERankingEditorViewModel to display a notification when a conference ranking isn't found instead of displaying the "not found" text in the field. - Refactored PreferencesMigrations.addICORERankingFieldToGeneralTab to better align with JabRefCliPreferences.setLanguageDependentDefaultValues. - Minor refactor and remove redundant comment and whitespace in ConferenceRepository. - Update tests to Parameterized tests in ConferenceRepositoryTest and ConferenceAcronymExtractorTest. - Removed unused variables in ICORERankingEditor and ICORERankingEditorViewModel. - Update OldOpenOfficeCalcExportFormatContentSingleEntry.xml to align with the field ordering in StandardField. - Add documentation to PreferencesMigrations.addICORERankingFieldToGeneralTab and ConferenceAcronymExtractor.extract methods. Part of JabRef#13476

This reverts commit 75d2b47 which I accidentally pulled on my feature branch.

Part of JabRef#13476

trag-bot · 2025-08-28T21:11:14Z

@trag-bot didn't find any issues in the code! ✅✨

jabref-machine · 2025-08-28T21:11:40Z

Note that your PR will not be reviewed/accepted until you have gone through the mandatory checks in the description and marked each of them them exactly in the format of [x] (done), [ ] (not done yet) or [/] (not applicable).

TheYorouzoya · 2025-08-29T12:18:59Z

Not sure if this test is doing really the expected thing -- check for the expected value directly.

I left that test ambiguous because I was expecting the internal details of the class to change in the future. Currently, clients have to instantiate an object to use the isSimilar and similarity methods. This isn't wrong, but these objects do not differ at all in their internal state (hence, not needed).

So why not make this a "pure" utility class and expose only static methods?
The class hardcodes Levenshtein Distance as its similarity metric. Ideally, we should allow the client to pass a metric in the constructor to perform the comparison because Levenshtein might not be ideal for all cases. If, in the future, this metric is swapped for something else, a similarity test for an exact value might fail. So for now, I've simply removed the test.

This also brings me to address your other comment:

Since 3 of 3 attemts failed to extract a title, here my test data set: test-cases.zip

Maybe, it can be used to improve the matching; maybe not (future work ^^)

A lot of those tests fail because Levenshtein Distance is a rather poor candidate for finding and matching conference titles, especially with such a high threshold of 0.9 (which was prescribed in the original issue).

I will group the provided tests into categories so I can address them collectively:

Group A: Input where an acronym is present in parentheses along with other text.

Examples,

International Organization for Information Integration and Web-based Application and Services 2010 (iiWAS 2010)
2009 35th Euromicro Conference on Software Engineering and Advanced Applications (SEAA 2009)
Cloud Computing and Services Science: 6th International Conference (CLOSER 2016) - Revised Selected Papers
Proceedings of the 2nd International Conference on Cloud Computing and Service Science (CLOSER'12)

The current implementation can be adapted to address these cases by adding another step into the acronym matching process where the extracted string is first split on whitespace and then on special characters with acronym lookups for each step. The two-step process is necessary to account for outliers like IEEE CCNC (space in acronym), ACM_WiSec, EC-TEL, or CODES+ISSS (special characters in acronym).

Group B: Input where Levenshtein similarity matching is guaranteed to fail

Examples,

Proceedings of the 3rd International Conference on Cloud Computing and Service Science, CLOSER 2013, 8-10 May 2013, Aachen, Germany
22nd IEEE International Enterprise Distributed Object Computing Conference, EDOC 2018, Stockholm, Sweden, October 16-19, 2018
Web Engineering - 18th International Conference, ICWE 2018, Cáceres, Spain, June 5-8, 2018, Proceedings
Advances in Information Retrieval - 41st European Conference on IR Research, ECIR 2019, Cologne, Germany, April 14-18, 2019, Proceedings, Part II

Cases where a conference title is present with extra text that is closer to the title's length or where the title is jumbled up will yield a very poor similarity rating. The edit distance for Levenshtein is simply too large in these cases. While it may seem, at first, that we can simply get a bunch of substrings by splitting on commas ,, hyphens -, or other special characters and do a fuzzy match, it might not work in our favor considering titles in our data often use those special characters themselves. For example, International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing.

Group C: Input where the data is outdated/mismatched

Examples,

Proceedings of the International Conference on Services Computing, SCC 2008: Acronym changed to IEEE SSE.
2023 IEEE Intelligent Vehicles Symposium (IV): Title- Intelligent Vehicles Conference. Acronym - IEEE-IV.

There's also quite a number of entries in the original ICORE data where information about a title/acronym change is included. A few examples:

International Conference Abstract State Machines, Alloy, B, TLA, VDM, and Z (Previously International Conference of B and Z Users, ZB, changed in 2008)
International Conference for High Performance Computing, Networking, Storage and Analysis (was Supercomputing Conference)
International Conference on Advanced Information Networking and Applications (was ICOIN)
International Conference on Embedded Wireless Systems and Networks (wasEuropean Conference on Wireless Sensor Networks)
International Conference on Hardware/Software Codesign and System Synthesis (previously ISSN, changed in 2003)
International Conference on Interactive Digital Storytelling (2008 merger of 'ICVS International Conference on Virtual Storytelling' and 'TIDSE Technology for Interactive Digital Storytelling')
International Conference on Managed Programming Languages and Runtimes (was ManLang and previously Principles and Practice of Programming in Java: PPPJ)

Now, we can try to extract the previous title or acronym from them but, as you can see, there's no consistent pattern here. Sometimes, it indicates the previous title, sometimes it is just the acronym, the words used change frequently ('was', 'previously', 'changed in', 'merger of'), or both the acronym and title are included, and so on.

Proposed Solution

I've addressed the Group A problem right there, so that is an easy fix.
For Group B, we can do a substring match pass before our fuzzy fallback. Testing out other similarity metrics (like Jaro-Winkler, which are better suited for smaller strings) or relaxing the threshold might improve the matching. We can further do a normalization pass on the original data to strip away special characters so that we can match these kinds of input better.
Are there more common prefixes like "Proceedings of", "Advances in", etc., which can be present in real-world titles? If it is a common occurrence, we can have a list of them which can help normalize inputs for better matching.
The original ICORE data needs more pre-processing. First, to strip away extra special characters. Second, we can extract and compile a separate list of aliases for the old names, which can also catch some of the outdated titles.

I'll start working on implementing some of the more immediate changes. If you have any input on this, then I'd appreciate it.

koppor · 2025-08-31T20:23:31Z

I'll start working on implementing some of the more immediate changes. If you have any input on this, then I'd appreciate it.

Thank you for the comments and ideas. Especially, the grouping is nice. It reminds me of the issue #12728.

Naively if an abbreviator of booktitles is implemented, one could run this on both the booktitle field and the ICORE conference title field. And then do excact match? Maybe too much matches then, but could be better than the current solution.

Maybe, we should split work here to keep focused. Leave the current handling as is and weave-in an abbreviator later? To have some separation of concerns? :)

Side track - the discussion somehow refs following issues

Make abbreviations also working for conferences #12728 (described above)
New integrity checker for booktitle #12271 ("force" the user to have clear titles)
Add cleanup operation for journal abbreviation #11791 (idea: if Make abbreviations also working for conferences #12728 is implemented, one could run it automatically)

Port DuplicateCheck.similarity to StringSimilarity and add tests

d578f69

trag-bot bot reviewed Aug 15, 2025

View reviewed changes

trag-bot bot reviewed Aug 23, 2025

View reviewed changes

trag-bot bot reviewed Aug 27, 2025

View reviewed changes

Use List.of() and fix grammar

33ed0db

koppor mentioned this pull request Aug 28, 2025

Do not report unchanged (next to changed statements) comments as changed tsantalis/RefactoringMiner#993

Open

Reorder fields

806309e

koppor mentioned this pull request Aug 28, 2025

Code not moved between files should not be shown as deleted tsantalis/RefactoringMiner#994

Closed

Fix unsupported operation exception

5330632

trag-bot bot reviewed Aug 28, 2025

View reviewed changes

Rename field

b092aa9

trag-bot bot reviewed Aug 28, 2025

View reviewed changes

koppor requested changes Aug 28, 2025

View reviewed changes

Merge branch 'main' into add-ICORE-ranking-support

52d3acd

koppor and others added 10 commits August 28, 2025 16:15

Hotfix: calling of publish.yml

75d2b47

Port DuplicateCheck.similarity to StringSimilarity and add tests

45a1d10

Fix Merge Conflict From Upstream Fetch

cbd4dda

Part of JabRef#13476

Merged changes to FieldFactory and StandardField

6747ad1

Revert "Hotfix: calling of publish.yml"

89c6600

This reverts commit 75d2b47 which I accidentally pulled on my feature branch.

Remove duplicate line in CHANGELOG

c63e2da

Part of JabRef#13476

		// Lucky path
		ConferenceEntry conference = titleToConference.get(query);

Uh oh!

Add support for automatic ICORE conference ranking lookup [#13476] #13699

Are you sure you want to change the base?

Add support for automatic ICORE conference ranking lookup [#13476] #13699

Uh oh!

Conversation

TheYorouzoya commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Steps to test

Some caveats:

Mandatory checks

Uh oh!

trag-bot bot Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

trag-bot bot Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

TheYorouzoya commented Aug 19, 2025

Uh oh!

TheYorouzoya commented Aug 19, 2025

Uh oh!

TheYorouzoya commented Aug 19, 2025

Uh oh!

TheYorouzoya commented Aug 19, 2025

Uh oh!

koppor commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

koppor commented Aug 19, 2025

Uh oh!

koppor commented Aug 19, 2025

Uh oh!

koppor commented Aug 19, 2025

Uh oh!

koppor commented Aug 19, 2025

Uh oh!

koppor commented Aug 19, 2025

Uh oh!

koppor commented Aug 19, 2025

Uh oh!

koppor commented Aug 19, 2025

Uh oh!

koppor commented Aug 19, 2025

Uh oh!

TheYorouzoya commented Aug 20, 2025

Uh oh!

TheYorouzoya commented Aug 20, 2025

Uh oh!

koppor commented Aug 20, 2025

Uh oh!

subhramit commented Aug 21, 2025

Uh oh!

koppor commented Aug 22, 2025

Uh oh!

trag-bot bot Aug 23, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

trag-bot bot Aug 23, 2025

Choose a reason for hiding this comment

Uh oh!

trag-bot bot Aug 23, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

trag-bot bot Aug 23, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

trag-bot bot Aug 23, 2025

Choose a reason for hiding this comment

Uh oh!

TheYorouzoya commented Aug 15, 2025 •

edited

Loading

koppor commented Aug 19, 2025 •

edited

Loading