Skip to content

Conversation

miguel-cordoba
Copy link
Contributor

@miguel-cordoba miguel-cordoba commented Jul 24, 2025

  • Addresses unresolved previous review comments

Closes #12810

Steps to test

see 12888

test

Mandatory checks

  • I own the copyright of the code submitted and I license it under the MIT license
  • Change in CHANGELOG.md described in a way that is understandable for the average user (if change is visible to the user)
  • Tests created for changes (if applicable)
  • Manually tested changed features in running JabRef (always required)
  • Screenshots added in PR description (if change is visible to the user)
  • Checked developer's documentation: Is the information available and up to date? If not, I outlined it in this pull request.
  • Checked documentation: Is the information available and up to date? If not, I created an issue at https://github.com/JabRef/user-documentation/issues or, even better, I submitted a pull request to the documentation repository.

@miguel-cordoba miguel-cordoba marked this pull request as ready for review July 24, 2025 12:37
@miguel-cordoba miguel-cordoba marked this pull request as draft July 24, 2025 12:43
@miguel-cordoba
Copy link
Contributor Author

Hey there! I'm completely new to jabref and open source in general and I am exploring how things work.

I noticed that when I enter keywords for a TechReport entry for example: Ensenada, then Ensenada, Baja California, and then again Ensenada, the app seems to cut the word unexpectedly, like only showing "E". I see that we check if the keywordList already contains the current keyword (KeywordList:140), but I guess this might interfere with the desired behaviour described here.

Really looking forward for your input!

@miguel-cordoba miguel-cordoba marked this pull request as ready for review July 24, 2025 12:59
StringBuilder currentToken = new StringBuilder();
boolean isEscaping = false;

for (int i = 0; i < keywordString.length(); i++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note this is #12888 (comment) :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your review! As I dived into the project I saw that this loop had not been addressed. So I used @ungerts proposed comment, since it is readable and solves the issue. Am I missing something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very OK. I like links to provide context.

Other reviewers might think: why a for loop etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh ok perfect :)

Copy link
Member

@koppor koppor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests for org.jabref.model.entry.Keyword#getSubchainAsString missing

Should have different string representations: Keyword > Keyword vs. Keyword \> Keyword

Think of "round trip"

@@ -115,4 +115,41 @@ void mergeTwoDistinctKeywordsShouldReturnTheTwoKeywordsMerged() {
void mergeTwoListsOfKeywordsShouldReturnTheKeywordsMerged() {
assertEquals(new KeywordList("Figma", "Adobe", "JabRef", "Eclipse", "JetBrains"), KeywordList.merge("Figma, Adobe, JetBrains, Eclipse", "Adobe, JabRef", ','));
}

@Test
void parseKeywordWithEscapedDelimiterDoesNotSplitKeyword() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Convert to ParameterizedTest

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gathered all the previous tests in a Parameterized one, I also added roundtrip tests on KeyworList and Keyword. Thank you so much for your review!

String serialized = parsed.toString();
KeywordList reparsed = KeywordList.parse(serialized, ',', '>');

assertEquals(parsed.toString(), reparsed.toString());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparing string representations instead of actual objects may hide structural differences. Should directly compare KeywordList objects to ensure complete equality.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this doesnt apply here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment above the statement to indicate that the toSTring() fucntionality is tested.

Morevoer, test for

Suggested change
assertEquals(parsed.toString(), reparsed.toString());
assertEquals(original, parsed.toString());

No need for reparsed.


@ParameterizedTest
@ValueSource(strings = {
"Keyword > Keyword",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would excpect all cases from provideParseKeywordCases there.


@ParameterizedTest
@ValueSource(strings = {
"Keyword > Keyword",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test all of provideParseKeywordCases

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When extending this test it was revealed that the current implementation does not handle escaping as desired at deserializing, which made me extend the logic on Keyword#getSubchainAsString

@koppor
Copy link
Member

koppor commented Jul 31, 2025

Thank you for your insights! I'll have more free time in the coming weeks and Im happy to continue working on this.

Glad to hear - and sorry that we did not see that it will escalate when looking at the issue at first 😅. Looking forward.

@ryan-carpenter
Copy link

In case it is not clear: Escaping should only apply to the keyword separator character. Keywords already work properly when there is no conflict between the keywords and the separator.

@ryan-carpenter
Copy link

There should NOT be any auto escaping.

I think this is right for supporting escaping. However, JabRef should check on import and auto-escape the separator character if contained in individual keyword/phrases. If the imported file type is one that puts each keyword on it's own line (RIS, PubMed nbib), then characters that match the designated separator can be auto-escaped.

@miguel-cordoba
Copy link
Contributor Author

In case it is not clear: Escaping should only apply to the keyword separator character. Keywords already work properly when there is no conflict between the keywords and the separator.

There should NOT be any auto escaping.

I think this is right for supporting escaping. However, JabRef should check on import and auto-escape the separator character if contained in individual keyword/phrases. If the imported file type is one that puts each keyword on it's own line (RIS, PubMed nbib), then characters that match the designated separator can be auto-escaped.

Thank you, I will keep this in mind

@@ -13,6 +13,7 @@
*/
public class Keyword extends ChainNode<Keyword> implements Comparable<Keyword> {

// Note: {@link org.jabref.model.entry.KeywordList#parse(java.lang.String, java.lang.Character, java.lang.Character) offers configuration, which is not available here
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment merely states what is visible from the code and doesn't provide additional value or reasoning. It should be removed or enhanced with actual implementation rationale.

Comment on lines +88 to +90
/*
* Used for BibTex export, where we need to escape the delimiter with \
*/
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spelling 'BibTex' in the comment is incorrect according to project standards. It should be 'BibTeX' for consistency in documentation and comments.

@ryan-carpenter
Copy link

Thanks for this contribution!

Think of "round trip"

I noticed examples showing commas without spaces. This is perfect if it matches the input. If the keywords contain , (as in PubMed records) please keep the space.

@MethodSource("provideParseKeywordCases")
void roundTripPreservesStructure(String original) {
KeywordList parsed = KeywordList.oldParse(original, ',', '>');
// We need to test the toString() functionality
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment restates what is obvious from the code and doesn't provide additional value. Such comments should be removed as they create maintenance overhead.

@miguel-cordoba
Copy link
Contributor Author

Sorry for the delay in progressing on this. I started a new job recently and my bandwidth has been limited, but I really want to finish what I started here.

I’ve been experimenting with different approaches. Right now, the changes work reasonably well for UI input, but I have not successfully implemented complete round-trip fidelity for BibTeX serialization(with autoescaping) yet, some edgecases do not pass. Both UI and BibTeX being parsed with the same parsing method in KeyworList.

Now I have started considering using the old parse method for BibTeX parsing and I will be tweaking it further because I feel like that could be a good direction, keeping the internal model exaclty as it comes from .bib but then shown "nice" on the UI (UIparsed and toString()).

Feel free to course correct or give any observation, I am more familiar now with many JabRef concepts but I probably am missing details or specific things like different formats etc.

@jabref-machine
Copy link
Collaborator

Note that your PR will not be reviewed/accepted until you have gone through the mandatory checks in the description and marked each of them them exactly in the format of [x] (done), [ ] (not done yet) or [/] (not applicable).

Copy link

trag-bot bot commented Aug 31, 2025

@trag-bot didn't find any issues in the code! ✅✨

@jabref-machine
Copy link
Collaborator

JUnit tests of jablib are failing. You can see which checks are failing by locating the box "Some checks were not successful" on the pull request page. To see the test output, locate "Source Code Tests / Unit tests (pull_request)" and click on it.

You can then run these tests in IntelliJ to reproduce the failing tests locally. We offer a quick test running howto in the section Final build system checks in our setup guide.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement escaping for keyword separators
5 participants