Skip to content

Commit 6419e1d

Browse files
author
Gerit Wagner
committed
revisions: overview and figure
1 parent 192234a commit 6419e1d

File tree

6 files changed

+36
-37
lines changed

6 files changed

+36
-37
lines changed

figure_1.png

-492 KB
Loading

figure_1_backup.png

604 KB
Loading

jats/figure_1.png

-492 KB
Loading

jats/paper.jats

Lines changed: 21 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -161,9 +161,13 @@ a Creative Commons Attribution 4.0 International License (CC BY
161161
</sec>
162162
<sec id="overview-of-search-query-functionality">
163163
<title>Overview of search-query Functionality</title>
164-
<p><italic>search-query</italic> aims to support the entire process of
165-
managing academic search queries. Its core functionality is shown in
166-
Figure 1 and summarized in the following.</p>
164+
<p><italic>search-query</italic> treats academic search strategies as
165+
structured query objects rather than static strings. Query objects can
166+
be created programmatically or derived from search strings or JSON
167+
files, and are represented as object-oriented structures that capture
168+
Boolean logic, nesting, and field restrictions. Based on a query
169+
object, <italic>search-query</italic> supports the following
170+
operations, as illustrated in Figure 1:</p>
167171
<list list-type="bullet">
168172
<list-item>
169173
<p><bold>Load:</bold> <italic>search-query</italic> provides
@@ -177,6 +181,13 @@ a Creative Commons Attribution 4.0 International License (CC BY
177181
documentation outlines how to develop parsers for additional
178182
databases.</p>
179183
</list-item>
184+
<list-item>
185+
<p><bold>Save:</bold> Researchers can serialize the query object
186+
back into a standard string or file format for reporting and
187+
reuse. This facilitates transparency and reproducibility by
188+
allowing search strategies to be easily reported, shared or
189+
deposited.</p>
190+
</list-item>
180191
<list-item>
181192
<p><bold>Lint:</bold> <italic>search-query</italic> can apply
182193
linters to detect syntactical errors or inconsistencies that might
@@ -187,7 +198,7 @@ a Creative Commons Attribution 4.0 International License (CC BY
187198
registry, revealing that many published queries still contained
188199
errors even after peer review. By identifying such problems early,
189200
linters can help researchers validate and refine queries before
190-
execution. The linting component can be updated to cover more
201+
execution. The linting component can be extended to cover more
191202
databases and incorporate new messages, such as warnings for
192203
database-specific quirks.</p>
193204
</list-item>
@@ -203,17 +214,6 @@ a Creative Commons Attribution 4.0 International License (CC BY
203214
literature searches, future development will focus on adding more
204215
databases to the translation repertoire.</p>
205216
</list-item>
206-
<list-item>
207-
<p><bold>Save:</bold> After validation and refinement,
208-
<italic>search-query</italic> can serialize the query object back
209-
into a standard string or file format for reporting and reuse. In
210-
practice, this means that a query constructed or edited within the
211-
tool can be exported as a well-formatted search string that is
212-
ready to be executed in a database or included in the methods
213-
section of a paper. This facilitates transparency and
214-
reproducibility by allowing search strategies to be easily
215-
reported, shared or deposited.</p>
216-
</list-item>
217217
<list-item>
218218
<p><bold>Improve:</bold> Beyond basic syntax checking and
219219
translation, <italic>search-query</italic> aims to support
@@ -226,25 +226,19 @@ a Creative Commons Attribution 4.0 International License (CC BY
226226
suggestions and optimizations.</p>
227227
</list-item>
228228
<list-item>
229-
<p><bold>Automate:</bold> Finally, <italic>search-query</italic>
230-
is designed to support advanced automation efforts and to
229+
<p><bold>Automate:</bold> Automation primarily refers to the
231230
integrate with systematic review management systems, such as
232231
CoLRev
233232
(<xref alt="Wagner &amp; Prester, 2025" rid="ref-WagnerPrester2025" ref-type="bibr">Wagner
234233
&amp; Prester, 2025</xref>). The library offers programmatic
235234
access via its Python API, which means it can be embedded in
236-
scripts and pipelines to run searches or process queries without
237-
manual intervention. It also provides a command-line interface and
238-
git pre-commit hooks, allowing researchers to incorporate query
239-
validation into version control and continuous integration setups.
240-
By representing queries in the form of objects,
241-
<italic>search-query</italic> further enables advanced use cases
242-
such as executing searches on platforms that lack native Boolean
243-
query support, for instance, by breaking a complex query into
244-
multiple API calls.</p>
235+
scripts and pipelines to run searches automatically. It also
236+
provides a command-line interface and git pre-commit hooks,
237+
allowing researchers to incorporate query validation into version
238+
control and continuous integration setups. </p>
245239
</list-item>
246240
</list>
247-
<fig id="figU003Asearch_query">
241+
<fig>
248242
<caption><p>Core functionality of the <italic>search-query</italic>
249243
library</p></caption>
250244
<graphic mimetype="image" mime-subtype="png" xlink:href="figure_1.png" />

paper.md

Lines changed: 15 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -93,45 +93,50 @@ Requirements include:
9393

9494
# Overview of search-query Functionality
9595

96-
*search-query* aims to support the entire process of managing academic search queries.
97-
Its core functionality is shown in Figure 1 and summarized in the following.
96+
*search-query* treats academic search strategies as structured query objects rather than static strings.
97+
Query objects can be created programmatically or derived from search strings or JSON files, and are represented as object-oriented structures that capture Boolean logic, nesting, and field restrictions.
98+
Based on a query object, *search-query* supports the following operations, as illustrated in Figure 1:
9899

99100
- **Load:** *search-query* provides parsing capabilities to ingest search queries from both raw strings and JSON files.
100101
It parses database-specific query strings into internal, object-oriented representations of the search strategy.
101102
This allows the tool to capture complex Boolean logic and field restrictions in a standardized form.
102103
Currently, parsers are available for Web of Science, PubMed, and EBSCOHost.
103104
The *load* functionality is extensible and the documentation outlines how to develop parsers for additional databases.
104105

106+
- **Save:** Researchers can serialize the query object back into a standard string or file format for reporting and reuse.
107+
<!-- In practice, this means that a query constructed or edited within the tool can be exported as a well-formatted search string that is ready to be executed in a database or included in the methods section of a paper. -->
108+
This facilitates transparency and reproducibility by allowing search strategies to be easily reported, shared or deposited.
109+
105110
- **Lint:** *search-query* can apply linters to detect syntactical errors or inconsistencies that might compromise the search.
106111
It can check for issues such as unbalanced parentheses, logical operator misuse, or database-specific syntax errors.
107112
The validation rules are based on an analysis of a large corpus of real-world search strategies from the searchRxiv registry, revealing that many published queries still contained errors even after peer review.
108113
By identifying such problems early, linters can help researchers validate and refine queries before execution.
109-
The linting component can be updated to cover more databases and incorporate new messages, such as warnings for database-specific quirks.
114+
The linting component can be extended to cover more databases and incorporate new messages, such as warnings for database-specific quirks.
110115

111116
- **Translate:** The library can convert a query from one database syntax into another, enabling cross-platform use of search strategies.
112117
Using a generic query object as an intermediate representation, *search-query* currently supports translations between Web of Science, PubMed, and EBSCOHost.
113118
Such query translation functionality can eliminate manual efforts for rewriting queries and reduce the risk of human error during translation.
114119
In line with the vision of seamless cross-database literature searches, future development will focus on adding more databases to the translation repertoire.
115120

116-
- **Save:** After validation and refinement, *search-query* can serialize the query object back into a standard string or file format for reporting and reuse.
117-
In practice, this means that a query constructed or edited within the tool can be exported as a well-formatted search string that is ready to be executed in a database or included in the methods section of a paper.
118-
This facilitates transparency and reproducibility by allowing search strategies to be easily reported, shared or deposited.
119-
120121
- **Improve:** Beyond basic syntax checking and translation, *search-query* aims to support semantic query improvement to enhance recall and precision.
121122
As queries are represented as manipulable objects, researchers can programmatically experiment with modifications — for example, adding synonyms or adjusting field scopes — to observe how these changes affect the search results.
122123
In future work, this improvement functionality may be augmented with more automated suggestions and optimizations.
123124

124-
- **Automate:** Finally, *search-query* is designed to support advanced automation efforts and to integrate with systematic review management systems, such as CoLRev [@WagnerPrester2025].
125-
The library offers programmatic access via its Python API, which means it can be embedded in scripts and pipelines to run searches or process queries without manual intervention.
125+
- **Automate:** Automation primarily refers to the integrate with systematic review management systems, such as CoLRev [@WagnerPrester2025].
126+
The library offers programmatic access via its Python API, which means it can be embedded in scripts and pipelines to run searches automatically.
126127
It also provides a command-line interface and git pre-commit hooks, allowing researchers to incorporate query validation into version control and continuous integration setups.
127-
By representing queries in the form of objects, *search-query* further enables advanced use cases such as executing searches on platforms that lack native Boolean query support, for instance, by breaking a complex query into multiple API calls.
128+
<!-- By representing queries in the form of objects, *search-query* further enables advanced use cases such as executing searches on platforms that lack native Boolean query support, for instance, by breaking a complex query into multiple API calls. -->
128129

130+
![Core functionality of the \textit{search-query} library](figure_1.png){label="fig_overview" width="340pt"}
131+
132+
<!--
129133
\begin{figure}[ht]
130134
\centering
131135
\includegraphics[width=\textwidth]{figure_1.png}
132136
\caption{Core functionality of the \textit{search-query} library}
133137
\label{fig:search_query}
134138
\end{figure}
139+
-->
135140

136141
# Example Usage
137142

paper.pdf

-228 KB
Binary file not shown.

0 commit comments

Comments
 (0)