Skip to content

Commit 072a705

Browse files
authored
Constructor query validation (#36)
* validate queries in constructor call * updates * update * add_linter_messages: position+S * search-term/operator restrictions * update linters * update * update docs * resolve comments * rename: origin_platform > platform * update platform handling * activate ebsco validation * update constant useage * refactor wos: handle_year_search * update docs * pre-compile regex * update database * refactor/generalize checks
1 parent a40f3f4 commit 072a705

37 files changed

+1737
-906
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,3 +5,4 @@ docs/build
55
search_query/ebsco/__pycache__/*
66
search_query/pubmed/__pycache__/*
77
search_query/wos/__pycache__/*
8+
search_query/generic/__pycache__/*

docs/source/dev_docs/linter_development.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,8 @@ Use the appropriate base class when developing a new linter:
1313
- `QueryStringLinter`: for single query strings
1414
- `QueryListLinter`: for list-based query formats
1515

16-
Each linter must override the `validate_tokens()` method and optionally `validate_query_tree()` for deeper semantic checks.
16+
Each linter must override the `validate_tokens()` method and the `validate_query_tree()`.
17+
`validate_tokens()` is called when the query is parsed, and `validate_query_tree()` is called when the query tree is built (i.e., at the end of the parsing process **and** when the query is constructed programmatically).
1718

1819
Best Practices
1920
--------------

docs/source/dev_docs/linter_skeleton.py

Lines changed: 30 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,12 @@
1+
import typing
2+
13
from search_query.constants import QueryErrorCode
24
from search_query.constants import TokenTypes
35
from search_query.linter_base import QueryStringLinter
46

7+
if typing.TYPE_CHECKING:
8+
from search_query.query import Query
9+
510

611
class XYQueryStringLinter(QueryStringLinter):
712
"""Linter for XY query strings"""
@@ -19,26 +24,27 @@ class XYQueryStringLinter(QueryStringLinter):
1924
# ...
2025
}
2126

22-
def validate_tokens(self) -> None:
27+
def validate_tokens(
28+
self,
29+
*,
30+
tokens: typing.List[Token],
31+
query_str: str,
32+
search_field_general: str = "",
33+
) -> typing.List[Token]:
2334
"""Main validation routine"""
35+
36+
self.tokens = tokens
37+
self.query_str = query_str
38+
self.search_field_general = search_field_general
39+
2440
self.check_unbalanced_parentheses()
2541
self.check_unknown_token_types()
2642
self.check_invalid_token_sequences()
2743
self.check_operator_capitalization()
28-
self.check_invalid_characters_in_search_term("@&%$^~\\<>{}()[]#")
2944

3045
# custom validation
31-
self.check_unsupported_search_fields()
32-
self.check_field_positioning()
3346

34-
def check_unsupported_search_fields(self) -> None:
35-
for token in self.parser.tokens:
36-
if token.type == TokenTypes.FIELD and token.value not in VALID_FIELDS:
37-
self.add_linter_message(
38-
QueryErrorCode.SEARCH_FIELD_UNSUPPORTED,
39-
position=token.position,
40-
details=f"Field {token.value} is not supported",
41-
)
47+
return self.tokens
4248

4349
def check_invalid_token_sequences(self) -> None:
4450
for i, token in enumerate(self.parser.tokens[:-1]):
@@ -49,3 +55,15 @@ def check_invalid_token_sequences(self) -> None:
4955
position=self.parser.tokens[i + 1].position,
5056
details=f"Unexpected token after {token.type}",
5157
)
58+
59+
def validate_query_tree(self, query: Query) -> None:
60+
"""
61+
Validate the query tree.
62+
This method is called after the query tree has been built.
63+
"""
64+
65+
self.check_quoted_search_terms_query(query)
66+
self.check_operator_capitalization_query(query)
67+
self.check_invalid_characters_in_search_term_query(query, "@&%$^~\\<>{}()[]#")
68+
self.check_unsupported_search_fields_in_query(query)
69+
# term_field_query = self.get_query_with_fields_at_terms(query)

docs/source/dev_docs/tests.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -124,5 +124,5 @@ Test Types
124124

125125
.. note::
126126

127-
- Use helper functions like `print_debug_tokens()` to ease debugging.
127+
- Use helper functions like `parser.print_tokens()` to ease debugging.
128128
- Use `assert ... == ...` with fallbacks for `print(...)` for inspection.

docs/source/index.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -47,9 +47,9 @@ Creating a query programmatically is simple:
4747
from search_query import OrQuery, AndQuery
4848
4949
# Typical building-blocks approach
50-
digital_synonyms = OrQuery(["digital", "virtual", "online"], search_field="Abstract")
51-
work_synonyms = OrQuery(["work", "labor", "service"], search_field="Abstract")
52-
query = AndQuery([digital_synonyms, work_synonyms], search_field="Author Keywords")
50+
digital_synonyms = OrQuery(["digital", "virtual", "online"], search_field="ab")
51+
work_synonyms = OrQuery(["work", "labor", "service"], search_field="ab")
52+
query = AndQuery([digital_synonyms, work_synonyms])
5353
5454
..
5555
Parameters:

docs/source/lint/F1011.rst

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
.. _F1011:
2+
3+
F1011 — too-many-operators
4+
==========================
5+
6+
**Error Code**: F1011
7+
8+
**Message**: ``Too many operators in the query``
9+
10+
**Scope**: PLATFORM.WOS
11+
12+
**Description**: Too many operators in the query
13+
14+
**Back to**: :ref:`lint`

docs/source/lint/F1012.rst

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
.. _F1012:
2+
3+
F1012 — too-many-search-terms
4+
=============================
5+
6+
**Error Code**: F1012
7+
8+
**Message**: ``Too many search terms in the query``
9+
10+
**Scope**: PLATFORM.WOS
11+
12+
**Description**: Too many search terms in the query
13+
14+
**Back to**: :ref:`lint`

docs/source/lint/F2012.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
.. _F2012:
22

3-
F2012 — year-without-search-field
3+
F2012 — year-without-search-terms
44
=================================
55

66
**Error Code**: F2012

docs/source/lint/F2014.rst

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
.. _F2014:
2+
3+
F2014 — year-format-invalid
4+
===========================
5+
6+
**Error Code**: F2014
7+
8+
**Message**: ``Invalid year format.``
9+
10+
**Scope**: PLATFORM.WOS
11+
12+
**Description**: Invalid year format.
13+
14+
**Back to**: :ref:`lint`

docs/source/lint/W0013.rst

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
.. _W0013:
2+
3+
W0013 — non-standard-quotes
4+
===========================
5+
6+
**Error Code**: W0013
7+
8+
**Message**: ``Non-standard quotes``
9+
10+
**Scope**: all
11+
12+
**Description**: Non-standard quotes
13+
14+
**Back to**: :ref:`lint`

0 commit comments

Comments
 (0)