Skip to content

Conversation

TheYorouzoya
Copy link

@TheYorouzoya TheYorouzoya commented Aug 15, 2025

Closes #13476: Add support for automatic ICORE conference ranking lookup

This PR adds the required feature to enable ICORE conference ranking lookups whenever a BibTeX entry includes a conference title.

Task list mentioned in the original issue:

  • Move DuplicateCheck#similarity from org.jabref.logic.database to a new utility class: org.jabref.logic.util.strings.StringSimilarity.
  • Add ICORE.csv to src/main/resources/
  • Create a class that loads and indexes the ICORE data at instantiation.
  • Implement the logic to detect and return the ranking from a full conference name.
  • Create a new field: ICORANKING ('icoranking') and add it to org.jabref.model.entry.field.StandardField at the // JabRef-specific fields section.
  • Create a new field editor ICoreRankingEditor (inspired by IdentifierEditor) and integrate it into the UI by modifying FieldEditors#getForField(...). - The lookup button (like DOI lookup) to lookup the ranking in the CSV file
  • Write unit tests for acronym matching and similarity fallback.

Steps to test

  1. By default, the Icoreranking field shows up in the General Tab under the DOI field.
image
  1. Add a New Entry of type InProceedings and enter a conference acronym (in parentheses) in the Booktitle field. Then, navigate to the General Tab again and click the lookup rank button to see the ICORE rank for the conference.
image image
  1. Clicking the Open Conference Page button will open your default browser and take you to the ICORE conference page for the conference (for SIGCOMM in the screenshot, it would be here.

  2. In case an acronym isn't present in the title, the tool will then try to lookup the entire Booktitle in the ranking data, with a fuzzy match fallback of 90% similarity.

image image image
  1. The feature allows lookups for InProceedings, InCollection, and Article entry types and looks for conference titles in Booktitle, Journaltitle, or Title fields.
image image
  1. In case an acronym is present but it doesn't match anything, the feature will still fallback to searching for the entire title string. If a match is not found for the full title either, a notification with "not found" will be displayed and the Open Conference Page Button will be disabled.
image image

Some caveats:

  • The feature will always look for the acronym in the first, deepest set of parentheses it encounters. This is a direct consequence of how the regex in the ConferenceAcronymExtractor works (see related tests for details). Some examples to illustrate this:
    • (This doesn't get pulled (this does)) -> this does
    • (First) acronym is pulled, not the (second) one. -> First
    • (This doesn't (I DO)) and (this won't (either)). -> I DO
  • The fuzzy match fallback uses the Levenshtein Distance as its similarity metric with a threshold of 0.9. This can lead to some "unexpected" cases. For example, "ACM Conference on Economics and Computation (EC)" with the acronym removed does not match "ACM Conference on Economics and Computation" since the similarity rating is 0.896 (<0.9).

Mandatory checks

  • I own the copyright of the code submitted and I license it under the MIT license
  • Change in CHANGELOG.md described in a way that is understandable for the average user (if change is visible to the user)
  • Tests created for changes (if applicable)
  • Manually tested changed features in running JabRef (always required)
  • Screenshots added in PR description (if change is visible to the user)
  • TODO Checked documentation: Is the information available and up to date? If not, I created an issue at https://github.com/JabRef/user-documentation/issues or, even better, I submitted a pull request to the documentation repository.
  • [/] Checked developer's documentation: Is the information available and up to date? If not, I outlined it in this pull request.

Comment on lines +49 to +50
// both strings are zero length
if (longerLength == 0) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment is redundant as it simply restates what the code clearly shows. The comment doesn't provide additional information or reasoning.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment was added there by the original author of the code, I've merely ported it with the necessary modifications. That said, I will contest this review as the lines

final int longerLength = longer.length();
// both strings are zero length
if (longerLength == 0) {
    return 1.0;
}

do not explicitly state, on their own, that the two input strings are equal. Further, the comment is present in the original Stack Overflow post here. That being said, if you're adamant about it, I don't mind changing this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take the bot meanings with care, they are not always that good

double exactMatch = 1.0;
double similarity = similarityChecker.similarity(a, b);

assertTrue(similarity >= EPSILON_SIMILARITY && similarity < exactMatch);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using assertTrue with a boolean condition instead of asserting the actual contents. Should compare the actual similarity value with expected bounds using assertEquals.

@TheYorouzoya
Copy link
Author

For the last few days, I've just been browsing the code, reading the docs, and interacting with the application on my local machine. Since this is the first time I'm interacting with the JabRef ecosystem, and ICORE by extension, I have some questions regarding the app itself and the feature's use-case. I'll post each one as a separate comment. Apologies if some of these are too obvious.

@TheYorouzoya
Copy link
Author

The issue post mentions: "When a BibTeX entry includes a conference title". What does "conference title" here refer to?

A. Following the Getting Started guide on the app, when you add a new entry via Library->New Entry, the "Select entry type" dialogue box asks for an entry type. One of them is the "Conference" type. Hovering over it says that it is a legacy alias for "InProceedings". Both of these have a "Title" required field. My question is if the feature's exclusively for these types of BibTeX entries or for ALL entries regardless of their type.

I'm guessing the answer is any entry regardless of type, but I still have to ask to be sure.

B. If I'm querying all entries regardless of type, do I have to search only the Title field? There are entries where the conference name isn't in the title field. Case in point: I used the Web Search feature in the app to lookup the conference mentioned in the issue post: "ACIS Conference on Software Engineering Research, Management and Applications". I selected and imported the following entry: https://ieeexplore.ieee.org/document/9509045. It gets imported as an InProceedings, but the conference name is not in the Title field but in the Journal field under the Other Fields tab.

I'm assuming that I should be looking for the conference title and its acronym in all of the fields. Is there some sort of a standard way here regarding how entries are imported into JabRef so that I only have to look for the title inside a subset of fields rather than all of them?

@TheYorouzoya
Copy link
Author

The ICORE ranking data and its presentation.

A. Since each BibTeX entry is annotated with a year, I'm assuming that the user wants to see the ICORE ranking for that particular year. Again, very obvious, but I want to confirm. This would also mean that if a conference was added to ICORE later on, any entries from previous years should have a "Not Available" or "N/A" in the ICORE rank field. Or do we use the mismatched ranking as a fallback?

B. Since ICORE rankings are released roughly every 2-3 years, what about some weird edge cases within that time period? Say a conference was there for 2014, then it wasn't in the 2017 list, but then it was added again in 2018. Do we worry about entries for 2017 and give them an "N/A" rank? What if a conference's rank changed during that time period? Of course, if the answer to the previous question was that we do not use a fallback and stick to the exact date, then none of these questions matter.

C. The exported ICORE ranking data provided from the website (https://portal.core.edu.au/conf-ranks/) does not contain a header row in the CSV file. This isn't a big deal as the important bits can be made out quite clearly (all except one, that is). Each line contains 9 columns: ID, Title, Acronym, Source, Rank, UNKNOWN, FoR-1, FoR-2, FoR-3. The UNKNOWN field in column 6 is a "Yes" or "No" for every line. It is always present, but I can't seem to figure what it corresponds to (it isn't the Note or DBLP column from the website). Consequently, I cannot determine whether it is important. If you know what it is or if it is relevant to the feature, please let me know.

@TheYorouzoya
Copy link
Author

@koppor can you please help answer the questions I've posted above?

@koppor
Copy link
Member

koppor commented Aug 19, 2025

The issue post mentions: "When a BibTeX entry includes a conference title". What does "conference title" here refer to?

Oh, did you ever read about bibtex? - it is @InProceedings plus @InCollection and booktitle.

@koppor
Copy link
Member

koppor commented Aug 19, 2025

A. Following the Getting Started guide on the app, when you add a new entry via Library->New Entry, the "Select entry type" dialogue box asks for an entry type. One of them is the "Conference" type. Hovering over it says that it is a legacy alias for "InProceedings". Both of these have a "Title" required field. My question is if the feature's exclusively for these types of BibTeX entries or for ALL entries regardless of their type.

I'm guessing the answer is any entry regardless of type, but I still have to ask to be sure.

title is context dependend. We refer to booktitle. You can read on at https://ctan.org/pkg/biblatex for more information on bibtex if you want.

@koppor
Copy link
Member

koppor commented Aug 19, 2025

B. Since ICORE rankings are released roughly every 2-3 years, what about some weird edge cases within that time period? Say a conference was there for 2014, then it wasn't in the 2017 list, but then it was added again in 2018. Do we worry about entries for 2017 and give them an "N/A" rank? What if a conference's rank changed during that time period? Of course, if the answer to the previous question was that we do not use a fallback and stick to the exact date, then none of these questions matter.

Always use the latest ranking. We are not interested in historic data. - Only one CSV should be used.

@koppor
Copy link
Member

koppor commented Aug 19, 2025

C. The exported ICORE ranking data provided from the website (portal.core.edu.au/conf-ranks) does not contain a header row in the CSV file. This isn't a big deal as the important bits can be made out quite clearly (all except one, that is). Each line contains 9 columns: ID, Title, Acronym, Source, Rank, UNKNOWN, FoR-1, FoR-2, FoR-3. The UNKNOWN field in column 6 is a "Yes" or "No" for every line. It is always present, but I can't seem to figure what it corresponds to (it isn't the Note or DBLP column from the website). Consequently, I cannot determine whether it is important. If you know what it is or if it is relevant to the feature, please let me know.

Web site has:

image

Which is

  • Title
  • Acronym
  • Source
  • Rank
  • Note
  • DBLP
  • Primary FoR
  • Comments
  • Average Rating

Example CSV line

9,"ACIS Conference on Software Engineering Research, Management and Applications",SERA,CORE2023,C,No,4612,,
1825,ACM International Joint Conference on Pervasive and Ubiquitous Computing (PERVASIVE and UbiComp combined from 2013),UbiComp,CORE2023,journal published,No,4608,,

(NOTE: It would be good if this was included in your question to make it self-contained)

I cannot quickly see it, but we need "Title", "Acronym" and "Rank" only. The other columns can be ommitted, can't they?

@koppor
Copy link
Member

koppor commented Aug 19, 2025

Please make your question numbers unique. "A" is used double, isn't it?

A. Since each BibTeX entry is annotated with a year, I'm assuming that the user wants to see the ICORE ranking for that particular year. Again, very obvious, but I want to confirm. This would also mean that if a conference was added to ICORE later on, any entries from previous years should have a "Not Available" or "N/A" in the ICORE rank field. Or do we use the mismatched ranking as a fallback?

No, always the latest year.

@koppor
Copy link
Member

koppor commented Aug 19, 2025

B. Since ICORE rankings are released roughly every 2-3 years, what about some weird edge cases within that time period? Say a conference was there for 2014, then it wasn't in the 2017 list, but then it was added again in 2018. Do we worry about entries for 2017 and give them an "N/A" rank? What if a conference's rank changed during that time period? Of course, if the answer to the previous question was that we do not use a fallback and stick to the exact date, then none of these questions matter.

Always use the latest CSV - there is one export. This CSV should be used.

@koppor
Copy link
Member

koppor commented Aug 19, 2025

B. If I'm querying all entries regardless of type, do I have to search only the Title field? There are entries where the conference name isn't in the title field. Case in point: I used the Web Search feature in the app to lookup the conference mentioned in the issue post: "ACIS Conference on Software Engineering Research, Management and Applications". I selected and imported the following entry: ieeexplore.ieee.org/document/9509045. It gets imported as an InProceedings, but the conference name is not in the Title field but in the Journal field under the Other Fields tab.

For @Article use getFieldOrAlias( StandardField.Title) , which will use JournalTitle (for BibLaTeX) first and then check Title.

@koppor
Copy link
Member

koppor commented Aug 19, 2025

@koppor can you please help answer the questions I've posted above?

I hope, I got all questions, I am a bit confused since the questions are all labeled with "A" and I could have missed something.

@koppor
Copy link
Member

koppor commented Aug 19, 2025

@TheYorouzoya I wonder if you have seen the "Helpful resources" section at the issue description (#13476)

image

It links to #13512

Did you know that one can click on "Files changed"?

image

You are routed to https://github.com/JabRef/jabref/pull/13512/files

You then might have seen

image

I know that code reading is not easy; but it is an essential skill to produce maintainable code.

@TheYorouzoya
Copy link
Author

I hope, I got all questions, I am a bit confused...

Thank you for your patience with answering my questions. I appreciate it.

Please make your question numbers unique...

I bundled my questions for a specific context under a singular comment so that it would be easier to reply to all of them like this

image

That said, this is my way of doing things, and I'm the guest here. Sorry about the confusion it lead to. Moving forward, I will post one individual question per comment. No labels included.

@TheYorouzoya
Copy link
Author

Oh, did you ever read about bibtex? - it is @InProceedings plus @InCollection and booktitle.

Starting from the issue post

image

I wanted to see for myself what the feature might look like inside of JabRef to the user. So I downloaded the current version and searched for the conference mentioned in the issue post. I imported one of the entries and saw that the conference name showed up inside the "Journal" field

image

Even though the manual says that the optional field for a venue exists

image

So I, then, booted up the build on my local repo, i.e., the jabgui:run task. I imported the same article again, but this time, the conference name shows up in the "Booktitle" field

image

even though there is a clearly indicated "Venue" field which is empty

image

Do you see why I would ask such an obvious question after this?

@koppor
Copy link
Member

koppor commented Aug 20, 2025

@TheYorouzoya Thank you for your patience. It's all voluntary work here. It needs time to explain the domain of scientific references. Maybe you can be a guest a little longer here and improve our documentation at https://docs.jabref.org. Currently we see guests being here just a short time, doing a task, and then leave. I always hope that a guest will make the place better as a whole; especially because all guests seem to be learning software engineering and not just programming.

Data sourced from ICORE website here: https://portal.core.edu.au/conf-ranks/ to enable ICORE rank lookups.

As discussed here: JabRef#13699 (comment), only the latest data from ICORE is to be used. At this time, it is the ICORE2023 ranking data.

Part of JabRef#13476
@subhramit
Copy link
Member

I bundled my questions for a specific context under a singular comment so that it would be easier to reply to all of them like this

Moving forward, I will post one individual question per comment. No labels included.

Hey @TheYorouzoya - That is not needed. You can just use the labels to bundle questions under their respective contexts like you do, just add numbering to them (like A1, A2, etc.) so that they can be specifically and easily referred to when answering.

@koppor
Copy link
Member

koppor commented Aug 22, 2025

I am more used to Gitter (Matrix) chat for a bulk of questions 😅. Sorry for that!

I have to confess that I did not check the terms properly while writing. I used "venue" as a scientist indicating a conference. And I did not check whether BibLaTeX has some "definition" of venue. A "venue" meant in the issue is some @InCollection and @InProceedings entry. We identify it by the booktitle.

I also meant journal articles, which are defined by @Article having title or journaltitle. You can receive the value by getFieldOrAlias(StandardField.JOURNAL).

I hope, I could answer your question now and you are unblocked to move forward.

- Append a header row to resources/icore/ICORE2023.csv
- Add ConferenceEntry record to represent ICORE conference data
- Add ConferenceRepository to load conference data and allow conference lookups using an acronym or a bookTitle with fuzzy match as a fallback
- Add utility class to extract an acronym from a bookTitle
- Add tests

Part of JabRef#13476
// A slight modification of: https://stackoverflow.com/a/17759264
private static final Pattern PATTERN = Pattern.compile("\\(([^()]*)\\)");

public static Optional<String> extract(String input) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Method lacks input validation for null parameter which could lead to NullPointerException. While Optional return is good, the input should be validated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NonNull jspecify annotation is OK

Please add JavaDoc.

public Optional<ConferenceEntry> getConferenceFromBookTitle(String bookTitle) {
String query = bookTitle.strip().toLowerCase();

// Lucky path
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment does not add new information and can be plainly derived from the code. It should be removed as it doesn't provide additional context or reasoning.

String acronym,
String rank
) {
private final static String URL_PREFIX = "https://portal.core.edu.au/conf-ranks/";
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect order of modifiers. According to Java conventions and effective Java principles, it should be 'private static final' instead of 'private final static'.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix that in the next commit.

}

@Test
void extractReturnsEmptyforEmptyParentheses() {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Method name contains a typo: 'forEmptyParentheses' should be 'ForEmptyParentheses' to maintain consistent camelCase naming convention in test methods.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix that in the next commit.


@Test
void getConferenceFromBookTitleReturnsConferenceForFuzzyMatchAboveThreshold() {
// String similarity > 0.9
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment merely states what can be derived from the code and test name, not providing additional information about reasoning or implementation details.

@TheYorouzoya
Copy link
Author

I hope, I could answer your question now and you are unblocked to move forward.

Thank you! I'll work on the GUI side next. I do have some questions there, but I'll post those once I'm done looking around the code a bit more.

@subhramit
Copy link
Member

I am more used to Gitter (Matrix) chat for a bulk of questions 😅.

Also, here is a link to our gitter chat.

Comment on lines 96 to 97
// Lucky path
ConferenceEntry conference = titleToConference.get(query);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment 'Lucky path' doesn't add any new information and can be derived from the code itself. Comments should provide additional context or reasoning.

}

final int longerLength = longer.length();
// both strings are zero length
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is trivial and can be derived directly from the code condition (longerLength == 0). It should be removed as it doesn't add new information.

Comment on lines +3 to +5
/**
* A Conference Entry built from a subset of fields in the ICORE Ranking data
*/
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment merely restates what is obvious from the code and doesn't provide additional information about the purpose or constraints of the record.

@TheYorouzoya
Copy link
Author

@koppor please take a look at the progress so far and review.

*/
public static List<Field> getDefaultGeneralFields() {
List<Field> defaultGeneralFields = new ArrayList<>(Arrays.asList(StandardField.DOI, StandardField.CITATIONCOUNT, StandardField.CROSSREF, StandardField.KEYWORDS, StandardField.EPRINT, StandardField.URL, StandardField.FILE, StandardField.GROUPS, StandardField.OWNER, StandardField.TIMESTAMP));
List<Field> defaultGeneralFields = new ArrayList<>(List.of(StandardField.DOI, StandardField.ICORERANKING, StandardField.CITATIONCOUNT, StandardField.CROSSREF, StandardField.KEYWORDS, StandardField.EPRINT, StandardField.URL, StandardField.FILE, StandardField.GROUPS, StandardField.OWNER, StandardField.TIMESTAMP));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using ArrayList with List.of() is inefficient. Since the list is immediately populated, using Set.of() or List.of() directly would be more appropriate and aligned with modern Java practices.

CITATIONCOUNT("citationcount"),
TIMESTAMP("timestamp", FieldProperty.DATE),

// Timestamp-realted
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment contains a spelling error in the word 'realted' which should be 'related'. Variable and comment spelling accuracy is important for code maintainability.

Copy link
Member

@koppor koppor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good beginning!

Minor comments inside.

Please disalbe the button "openExternalLink" if there is no matched conference

image

undo is broken. I had a rank "B", clicked on lookup - replaced by "not found". Library modified, but I cannot undo.

image

Propossal: If there is a value in ICORE and lookup would replace it by "not found", do

dialogService.notify(Localization.long("not found"))

(or similar)

INSTEAD of replacing it.

* Calculates the similarity (a number within 0 and 1) between two strings.
* http://stackoverflow.com/questions/955110/similarity-string-comparison-in-java
*/
private static double similarity(final String first, final String second) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find :)

SpecialField.READ_STATUS, SpecialField.RELEVANCE
);

if (!currentGeneralPrefs.equals(expectedGeneralPrefs)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice check!

Map<String, Set<Field>> entryEditorPrefs = preferences.getEntryEditorPreferences().getEntryEditorTabs();
Set<Field> currentGeneralPrefs = entryEditorPrefs.get("General");

Set<Field> expectedGeneralPrefs = Set.of(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think org.jabref.model.entry.field.FieldFactory#getDefaultGeneralFields should be linked - to enable others to lookup things if they implement something similar.

@@ -558,6 +561,37 @@ static void moveApiKeysToKeyring(JabRefCliPreferences preferences) {
}
}

static void addICORERankingFieldToGeneralTab(GuiPreferences preferences) {
Map<String, Set<Field>> entryEditorPrefs = preferences.getEntryEditorPreferences().getEntryEditorTabs();
Set<Field> currentGeneralPrefs = entryEditorPrefs.get("General");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, Localization.lang("General") should be used - see org.jabref.logic.preferences.JabRefCliPreferences#setLanguageDependentDefaultValues

}

entryEditorPrefs.put(
"General",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also org.jabref.logic.preferences.JabRefCliPreferences#setLanguageDependentDefaultValues

import static org.junit.jupiter.api.Assertions.assertEquals;
import static org.junit.jupiter.api.Assertions.assertTrue;

public class ConferenceAcronymExtractorTest {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rewrite to a @ParamterizedTest and @CsvCource. Use modern Optional testing - see comments on other test

@FXML private Button visitICOREConferencePageButton;

@Inject private DialogService dialogService;
@Inject private TaskExecutor taskExecutor;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused variable - remove - also at constructor

@Inject private DialogService dialogService;
@Inject private TaskExecutor taskExecutor;
@Inject private GuiPreferences preferences;
@Inject private UndoManager undoManager;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused variable- remove

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • UndoManager is required by the constructor inside the AbstractEditorViewModel superclass.
  • GuiPreferences is required to open the conference page via the call to NativeDesktop.openBrowser() since it needs the user's externalApplicationPreferences.
  • DialogService will be required to apply the fix suggested here.

I'll remove the TaskExecutor since it is not being used.

// A slight modification of: https://stackoverflow.com/a/17759264
private static final Pattern PATTERN = Pattern.compile("\\(([^()]*)\\)");

public static Optional<String> extract(String input) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NonNull jspecify annotation is OK

Please add JavaDoc.

CREATIONDATE("creationdate", FieldProperty.DATE),
MODIFICATIONDATE("modificationdate", FieldProperty.DATE);
GROUPS("groups"),
ICORERANKING("icoreranking"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be named icore to be more short - WDYT?

(We also don't have ownername or groupname, because the prefix is unique enough)

I did it at b092aa9

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am completely new to this domain, so I don't really have an input. My decision was based purely on the task in the issue post:

Create a new field: ICORANKING ('icoranking') and add it to org.jabref.model.entry.field.StandardField at the // JabRef-specific fields section.

The misspelling was later corrected in this comment here.

I'll update it with your suggestion.

@koppor
Copy link
Member

koppor commented Aug 28, 2025

Since 3 of 3 attemts failed to extract a title, here my test data set: test-cases.zip

Maybe, it can be used to improve the matching; maybe not (future work ^^)

@jabref-machine
Copy link
Collaborator

JUnit tests of jablib are failing. You can see which checks are failing by locating the box "Some checks were not successful" on the pull request page. To see the test output, locate "Source Code Tests / Unit tests (pull_request)" and click on it.

You can then run these tests in IntelliJ to reproduce the failing tests locally. We offer a quick test running howto in the section Final build system checks in our setup guide.

koppor and others added 10 commits August 28, 2025 16:15
Data sourced from ICORE website here: https://portal.core.edu.au/conf-ranks/ to enable ICORE rank lookups.

As discussed here: JabRef#13699 (comment), only the latest data from ICORE is to be used. At this time, it is the ICORE2023 ranking data.

Part of JabRef#13476
- Append a header row to resources/icore/ICORE2023.csv
- Add ConferenceEntry record to represent ICORE conference data
- Add ConferenceRepository to load conference data and allow conference lookups using an acronym or a bookTitle with fuzzy match as a fallback
- Add utility class to extract an acronym from a bookTitle
- Add tests

Part of JabRef#13476
- Out of the 4 missing keys in a failing test, two new keys were added to JabRef_en.properties while two were edited adapted to already present ones.
- The performExportForSingleEntry test in OpenOfficeDocumentCreatorTest was failing because of the missing "Icoreranking" field. Further, since the ordering of JabRef-specific fields was changed in a previous commit to conform to alphabetical order, the hardcoded values in OldOpenOfficeCalcExportFormatContentSingleEntry.xml weren't matching with the exporter's output. This commit reorders the fields in the expected order and adds the "Icoreranking" field at its right place which fixes the broken test.

Part of JabRef#13476
- Update ICORERankingEditorViewModel to display a notification when a conference ranking isn't found instead of displaying the "not found" text in the field.
- Refactored PreferencesMigrations.addICORERankingFieldToGeneralTab to better align with JabRefCliPreferences.setLanguageDependentDefaultValues.
- Minor refactor and remove redundant comment and whitespace in ConferenceRepository.
- Update tests to Parameterized tests in ConferenceRepositoryTest and ConferenceAcronymExtractorTest.
- Removed unused variables in  ICORERankingEditor and ICORERankingEditorViewModel.
- Update OldOpenOfficeCalcExportFormatContentSingleEntry.xml to align with the field ordering in StandardField.
- Add documentation to PreferencesMigrations.addICORERankingFieldToGeneralTab and ConferenceAcronymExtractor.extract methods.

Part of JabRef#13476
This reverts commit 75d2b47 which I
accidentally pulled on my feature branch.
Copy link

trag-bot bot commented Aug 28, 2025

@trag-bot didn't find any issues in the code! ✅✨

@jabref-machine
Copy link
Collaborator

Note that your PR will not be reviewed/accepted until you have gone through the mandatory checks in the description and marked each of them them exactly in the format of [x] (done), [ ] (not done yet) or [/] (not applicable).

@TheYorouzoya
Copy link
Author

Not sure if this test is doing really the expected thing -- check for the expected value directly.

I left that test ambiguous because I was expecting the internal details of the class to change in the future. Currently, clients have to instantiate an object to use the isSimilar and similarity methods. This isn't wrong, but these objects do not differ at all in their internal state (hence, not needed).

So why not make this a "pure" utility class and expose only static methods?
The class hardcodes Levenshtein Distance as its similarity metric. Ideally, we should allow the client to pass a metric in the constructor to perform the comparison because Levenshtein might not be ideal for all cases. If, in the future, this metric is swapped for something else, a similarity test for an exact value might fail. So for now, I've simply removed the test.

This also brings me to address your other comment:

Since 3 of 3 attemts failed to extract a title, here my test data set: test-cases.zip

Maybe, it can be used to improve the matching; maybe not (future work ^^)

A lot of those tests fail because Levenshtein Distance is a rather poor candidate for finding and matching conference titles, especially with such a high threshold of 0.9 (which was prescribed in the original issue).

I will group the provided tests into categories so I can address them collectively:

Group A: Input where an acronym is present in parentheses along with other text.

Examples,

  • International Organization for Information Integration and Web-based Application and Services 2010 (iiWAS 2010)
  • 2009 35th Euromicro Conference on Software Engineering and Advanced Applications (SEAA 2009)
  • Cloud Computing and Services Science: 6th International Conference (CLOSER 2016) - Revised Selected Papers
  • Proceedings of the 2nd International Conference on Cloud Computing and Service Science (CLOSER'12)

The current implementation can be adapted to address these cases by adding another step into the acronym matching process where the extracted string is first split on whitespace and then on special characters with acronym lookups for each step. The two-step process is necessary to account for outliers like IEEE CCNC (space in acronym), ACM_WiSec, EC-TEL, or CODES+ISSS (special characters in acronym).

Group B: Input where Levenshtein similarity matching is guaranteed to fail

Examples,

  • Proceedings of the 3rd International Conference on Cloud Computing and Service Science, CLOSER 2013, 8-10 May 2013, Aachen, Germany
  • 22nd IEEE International Enterprise Distributed Object Computing Conference, EDOC 2018, Stockholm, Sweden, October 16-19, 2018
  • Web Engineering - 18th International Conference, ICWE 2018, Cáceres, Spain, June 5-8, 2018, Proceedings
  • Advances in Information Retrieval - 41st European Conference on IR Research, ECIR 2019, Cologne, Germany, April 14-18, 2019, Proceedings, Part II

Cases where a conference title is present with extra text that is closer to the title's length or where the title is jumbled up will yield a very poor similarity rating. The edit distance for Levenshtein is simply too large in these cases. While it may seem, at first, that we can simply get a bunch of substrings by splitting on commas ,, hyphens -, or other special characters and do a fuzzy match, it might not work in our favor considering titles in our data often use those special characters themselves. For example, International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing.

Group C: Input where the data is outdated/mismatched

Examples,

  • Proceedings of the International Conference on Services Computing, SCC 2008: Acronym changed to IEEE SSE.
  • 2023 IEEE Intelligent Vehicles Symposium (IV): Title- Intelligent Vehicles Conference. Acronym - IEEE-IV.

There's also quite a number of entries in the original ICORE data where information about a title/acronym change is included. A few examples:

  • International Conference Abstract State Machines, Alloy, B, TLA, VDM, and Z (Previously International Conference of B and Z Users, ZB, changed in 2008)
  • International Conference for High Performance Computing, Networking, Storage and Analysis (was Supercomputing Conference)
  • International Conference on Advanced Information Networking and Applications (was ICOIN)
  • International Conference on Embedded Wireless Systems and Networks (wasEuropean Conference on Wireless Sensor Networks)
  • International Conference on Hardware/Software Codesign and System Synthesis (previously ISSN, changed in 2003)
  • International Conference on Interactive Digital Storytelling (2008 merger of 'ICVS International Conference on Virtual Storytelling' and 'TIDSE Technology for Interactive Digital Storytelling')
  • International Conference on Managed Programming Languages and Runtimes (was ManLang and previously Principles and Practice of Programming in Java: PPPJ)

Now, we can try to extract the previous title or acronym from them but, as you can see, there's no consistent pattern here. Sometimes, it indicates the previous title, sometimes it is just the acronym, the words used change frequently ('was', 'previously', 'changed in', 'merger of'), or both the acronym and title are included, and so on.

Proposed Solution

  • I've addressed the Group A problem right there, so that is an easy fix.
  • For Group B, we can do a substring match pass before our fuzzy fallback. Testing out other similarity metrics (like Jaro-Winkler, which are better suited for smaller strings) or relaxing the threshold might improve the matching. We can further do a normalization pass on the original data to strip away special characters so that we can match these kinds of input better.
  • Are there more common prefixes like "Proceedings of", "Advances in", etc., which can be present in real-world titles? If it is a common occurrence, we can have a list of them which can help normalize inputs for better matching.
  • The original ICORE data needs more pre-processing. First, to strip away extra special characters. Second, we can extract and compile a separate list of aliases for the old names, which can also catch some of the outdated titles.

I'll start working on implementing some of the more immediate changes. If you have any input on this, then I'd appreciate it.

@koppor
Copy link
Member

koppor commented Aug 31, 2025

I'll start working on implementing some of the more immediate changes. If you have any input on this, then I'd appreciate it.

Thank you for the comments and ideas. Especially, the grouping is nice. It reminds me of the issue #12728.

Naively if an abbreviator of booktitles is implemented, one could run this on both the booktitle field and the ICORE conference title field. And then do excact match? Maybe too much matches then, but could be better than the current solution.

Maybe, we should split work here to keep focused. Leave the current handling as is and weave-in an abbreviator later? To have some separation of concerns? :)

Side track - the discussion somehow refs following issues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for automatic ICORE conference ranking lookup
5 participants