Skip to content

Translation and Validation for EBSCOhost#19

Merged
geritwagner merged 57 commits intoCoLRev-Environment:mainfrom
ThomasFleischmann:main
Apr 9, 2025
Merged

Translation and Validation for EBSCOhost#19
geritwagner merged 57 commits intoCoLRev-Environment:mainfrom
ThomasFleischmann:main

Conversation

@ThomasFleischmann
Copy link
Contributor

This request entails:

  • the search_query/parser_ebsco.py (fundamental functionalities for parsing)
  • the search_query/parser_validation.py (fundamental functionalities for validating)
  • the search_query/proximity_query_ebsco.py (A new operator like or_query.py or and_query.py
  • the search_query/serializer_ebsco.py (translation from pre-notation to ebsco format)
  • the search_query/save_file.py (written for the ebsco_example to safe files to ebsco format)
  • the test/test_parser_ebsco.py (unit tests for tokenization, linter, and parsing)
  • the test/ebsco_example.py (an example on how to load and translate queries, as well as how to build your own queries and save it to a json file)

@ThomasFleischmann
Copy link
Contributor Author

I have added the artificial parentheses function and some small changes to the other files

Here is a short description:

add_artificial_parentheses:

  • as discussed adds parenthesis with pos(-1, -1)
  • takes care of simple value change: OR -> AND, AND -> OR
  • keeps original parentheses blocks
  • tracks opened artificial parentheses and closes them at the end of a search-block
  • depending on value difference, adds more than one open/close parenthesis (takes care of issue where there are several value changes at once: A OR B NOT C AND D -> A OR ((B NOT C) AND D)
  • deleted previous solution with tree restructuring
  • is being called between tokenization and parsing to keep original tokens for automated testing

other changes:

  • Changed SearchField DE to "DESCRIPTORS" and KW to "KEYWORD" (pylint showed error double keyword)
  • Changed searchRxiv-import "author" to "authors" to comply with EBSCO files

Gerit Wagner added 3 commits March 25, 2025 21:36
- add one pair of parentheses (regardless of how many levels of
  precedence are between operators)
- add unit tests
- fix parser_validation: skip quoted strings (not SEARCH_FIELDs)
@geritwagner
Copy link
Contributor

Thank you! I have added unit tests and made a few revisions. If you agree with the current implementation of add_artificial_parentheses_for_operator_precedence(...), we could share it with the others. We may even make it a method of QueryStringParser (which inheriting classes/parsers could use or override).

I also added a suggestion for handling linter messages. If you agree with the changes, we will need to add the validation messages to the constants (this will also make them available in the documentation).

@geritwagner
Copy link
Contributor

Note: We should also mention the add_artificial_parentheses_for_operator_precedence(...) in the developer documentation (as a method that can easily be reused).

@ThomasFleischmann
Copy link
Contributor Author

Thank you very much for the information.

I have looked through the changes and agree, it would also be a good idea to make inheritable through QueryStringParser.
Here i have a short question though, maybe it would also be a probable idea for a new parser to make it possible to change the priority? Meaning that if there is a search system that uses a different priority and a parser is being built for that, should that new parser also use the artificial parenthesis, but maybe with a different set of priority constants?

Regarding the validation, i will see to it until tomorrow.

Regarding mentioning add_artificial_parentheses in the documentation, is it enough to add the function to the parser_skeletton or is there another step i need to take?

@geritwagner
Copy link
Contributor

Thank you for having a look at the validation and for checking my suggestions. I would assume that the precedence could already be adapted by changing the OPERATOR_PRECEDENCE attribute (which is used in get_precedence()).

Adding the add_artificial_parentheses() to the skeleton in the docs would be great. If you like, you could also move the add_artificial_parentheses() to the QueryStringParser on the mainbranch (to make it available for the other parsers).

@ThomasFleischmann
Copy link
Contributor Author

These are the newest changes to the pull-request:

  • changed linter handling as per suggestion
  • removed unused linter list and messages in ebsco parser and linter
  • added an ebsco-specific linter message (wrong token sequence)
  • added some additional tests
  • moved add_artificial_parentheses_for_precedence to parser base to be used by all parsers

Regarding the add_artificial_parentheses, i also made some small changes. Here i had some queries in the automated tests that came back erroneous. The reason was i believe this:
natural* OR flood* OR earthquake* AND volcano* AND hurricane* NOT drought* OR tornado* OR climate*
The current version added the correct amount of closed-parentheses, but not enough open ones before the NOT part of the query.
I changed it so there are additional parentheses.
This however means, that in some instances there are redundant parentheses, however they do not collide with its logic.

To be honest, i am unsure as to what the issue is.

@geritwagner
Copy link
Contributor

geritwagner commented Apr 1, 2025

Thank you - I added the flatten_redundant_artificial_nesting(), which should resolve the issue. Considering that we have no "look-ahead" when adding artificial parentheses, adding them for each level of operator precedence (and removing redundant ones) seems to be the only viable option.

The last changes before we merge the pull request are to use the Token enums (see changes). I assume the test/ebsco_example.py could also be removed?

@ThomasFleischmann
Copy link
Contributor Author

Thank you for changing the enumeration. In the latest push i have only removed the test/ebsco_example.

@geritwagner geritwagner merged commit e845a66 into CoLRev-Environment:main Apr 9, 2025
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants