-
-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Add #12810: Implement escaping for keyword separators (FKA #12888) #13583
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…bRef#12888) - Addresses unresolved previous reviews: JabRef#12888 (comment)
…bRef#12888) - Addresses unresolved previous reviews: JabRef#12888 (comment) - Adds CHANGELOG.md entry
…bRef#12888) - Addresses unresolved previous reviews: JabRef#12888 (comment) - Adds CHANGELOG.md entry
Hey there! I'm completely new to jabref and open source in general and I am exploring how things work. I noticed that when I enter keywords for a TechReport entry for example: Ensenada, then Ensenada, Baja California, and then again Ensenada, the app seems to cut the word unexpectedly, like only showing "E". I see that we check if the keywordList already contains the current keyword (KeywordList:140), but I guess this might interfere with the desired behaviour described here. Really looking forward for your input! |
StringBuilder currentToken = new StringBuilder(); | ||
boolean isEscaping = false; | ||
|
||
for (int i = 0; i < keywordString.length(); i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note this is #12888 (comment) :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your review! As I dived into the project I saw that this loop had not been addressed. So I used @ungerts proposed comment, since it is readable and solves the issue. Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very OK. I like links to provide context.
Other reviewers might think: why a for loop etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooh ok perfect :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests for org.jabref.model.entry.Keyword#getSubchainAsString
missing
Should have different string representations: Keyword > Keyword
vs. Keyword \> Keyword
Think of "round trip"
@@ -115,4 +115,41 @@ void mergeTwoDistinctKeywordsShouldReturnTheTwoKeywordsMerged() { | |||
void mergeTwoListsOfKeywordsShouldReturnTheKeywordsMerged() { | |||
assertEquals(new KeywordList("Figma", "Adobe", "JabRef", "Eclipse", "JetBrains"), KeywordList.merge("Figma, Adobe, JetBrains, Eclipse", "Adobe, JabRef", ',')); | |||
} | |||
|
|||
@Test | |||
void parseKeywordWithEscapedDelimiterDoesNotSplitKeyword() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Convert to ParameterizedTest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I gathered all the previous tests in a Parameterized one, I also added roundtrip tests on KeyworList and Keyword. Thank you so much for your review!
String serialized = parsed.toString(); | ||
KeywordList reparsed = KeywordList.parse(serialized, ',', '>'); | ||
|
||
assertEquals(parsed.toString(), reparsed.toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comparing string representations instead of actual objects may hide structural differences. Should directly compare KeywordList objects to ensure complete equality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this doesnt apply here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment above the statement to indicate that the toSTring()
fucntionality is tested.
Morevoer, test for
assertEquals(parsed.toString(), reparsed.toString()); | |
assertEquals(original, parsed.toString()); |
No need for reparsed.
|
||
@ParameterizedTest | ||
@ValueSource(strings = { | ||
"Keyword > Keyword", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would excpect all cases from provideParseKeywordCases there.
|
||
@ParameterizedTest | ||
@ValueSource(strings = { | ||
"Keyword > Keyword", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test all of provideParseKeywordCases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When extending this test it was revealed that the current implementation does not handle escaping as desired at deserializing, which made me extend the logic on Keyword#getSubchainAsString
- extends unitTests
Glad to hear - and sorry that we did not see that it will escalate when looking at the issue at first 😅. Looking forward. |
…pPreservesStructure fails
In case it is not clear: Escaping should only apply to the keyword separator character. Keywords already work properly when there is no conflict between the keywords and the separator. |
I think this is right for supporting escaping. However, JabRef should check on import and auto-escape the separator character if contained in individual keyword/phrases. If the imported file type is one that puts each keyword on it's own line (RIS, PubMed nbib), then characters that match the designated separator can be auto-escaped. |
Thank you, I will keep this in mind |
@@ -13,6 +13,7 @@ | |||
*/ | |||
public class Keyword extends ChainNode<Keyword> implements Comparable<Keyword> { | |||
|
|||
// Note: {@link org.jabref.model.entry.KeywordList#parse(java.lang.String, java.lang.Character, java.lang.Character) offers configuration, which is not available here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment merely states what is visible from the code and doesn't provide additional value or reasoning. It should be removed or enhanced with actual implementation rationale.
/* | ||
* Used for BibTex export, where we need to escape the delimiter with \ | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The spelling 'BibTex' in the comment is incorrect according to project standards. It should be 'BibTeX' for consistency in documentation and comments.
Thanks for this contribution!
I noticed examples showing commas without spaces. This is perfect if it matches the input. If the keywords contain |
@MethodSource("provideParseKeywordCases") | ||
void roundTripPreservesStructure(String original) { | ||
KeywordList parsed = KeywordList.oldParse(original, ',', '>'); | ||
// We need to test the toString() functionality |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment restates what is obvious from the code and doesn't provide additional value. Such comments should be removed as they create maintenance overhead.
Sorry for the delay in progressing on this. I started a new job recently and my bandwidth has been limited, but I really want to finish what I started here. I’ve been experimenting with different approaches. Right now, the changes work reasonably well for UI input, but I have not successfully implemented complete round-trip fidelity for BibTeX serialization(with autoescaping) yet, some edgecases do not pass. Both UI and BibTeX being parsed with the same parsing method in KeyworList. Now I have started considering using the old parse method for BibTeX parsing and I will be tweaking it further because I feel like that could be a good direction, keeping the internal model exaclty as it comes from .bib but then shown "nice" on the UI (UIparsed and toString()). Feel free to course correct or give any observation, I am more familiar now with many JabRef concepts but I probably am missing details or specific things like different formats etc. |
Note that your PR will not be reviewed/accepted until you have gone through the mandatory checks in the description and marked each of them them exactly in the format of |
@trag-bot didn't find any issues in the code! ✅✨ |
JUnit tests of You can then run these tests in IntelliJ to reproduce the failing tests locally. We offer a quick test running howto in the section Final build system checks in our setup guide. |
Closes #12810
Steps to test
see 12888
Mandatory checks
CHANGELOG.md
described in a way that is understandable for the average user (if change is visible to the user)