Use multiple field/value terms in ES query by BartChris · Pull Request #6782 · kitodo/kitodo-production

BartChris · 2025-11-27T17:22:54Z

This PR addresses point 8) of #6743. The problem with the current implementation is, that when you have a search likeprojectname:MDP_LIKto retrieve all processes which have a metadata value of "MDP_LIK" in the project field, Kitodo first tokenizes the term to "MDP" and "LIK" and then issues two requests for Elasticsearch.
Both requests return independent sets of IDs. And when both sets do not intersect we might get less hits then we actually have.

For each token constructed at query time we inject the list of IDs (retrieved from the Index) into SQL, so at the end we have sth like WHERE id IN (x,y,z) and ID in (a,b,c). The individual ID lists might be huge because they contain all results which just contain the fragment "MDP".

My change does ensure that only one query is sent to the search index and only one list of IDs is returned, so we do not get multiple contradicting ID lists. It also in many cases ensures that the ID lists are way shorter because we already intersect at the Elasticsearch level and not just pass huge ID lists to the database which are not relevant.

This also frees us from having to construct complex SQL restrictions for multiple returned ID list. Since we only have one list we can just append a simple WHERE id in (<list od ids>).

matthias-ronge · 2026-03-16T07:19:40Z

-                    value, ids.size());
+            var query = searchSession.search(beanClass)
+                    .select(idField)
+                    .where(f -> {


Please rename f to a more speaking name, like field, filter, flag or what you intend

matthias-ronge · 2026-03-16T07:26:02Z

+            var query = searchSession.search(beanClass)
+                    .select(idField)
+                    .where(f -> {
+                        var bool = f.bool();


Is the type of bool very complex? If not, please write the type. I totally agree to use var in places where the type is half a line long, or obvious (or both) like in for loop openers, but here, when just reading the code, on GitHub—not in an IDE—I have no clue what bool is. Maybe also the name is unluckily chosen? I see bool.must(…) below. What must the boolean do? Maybe it should be named query? Maybe indexQuery?

You have a point, i will try to clarify. This structure is idiomatic Hibernate Search DSL:
https://docs.hibernate.org/search/6.2/reference/en-US/html_single/#query-predicate

I tried to achieve more clarity, but i do not want to depart too much from the way it is done in the Hibernate Search docs.
I also switched to a pure filter query because we are not doing any ranking, but just filter out matching records.

matthias-ronge · 2026-03-16T07:31:04Z

+            String termSummary = String.join(", ",
+                    terms.stream()
+                            .distinct()
+                            .map(t -> t.getLeft() + "=\"***\"")


Why do you log *** here? Is this a password? I prefer to read the query parameters with values in debug log. If this is due to length, I'd prefer "...", or maybe + '"' + (t.getRight().length() < 50 ? t.getRight : "...") + '"'

I will try to show the terms again. CI constantly failed last time i tried and complained that secrets are exposed.

matthias-ronge

There is only one question for me to clarify, if you can say that this isn’t a problem, then I approve the PR.

matthias-ronge · 2026-03-17T15:09:33Z

        query.setUnordered();
-        query.performIndexSearches();
+        Collection<Integer> queryIds = query.performIndexSearches();
+        if (!queryIds.isEmpty()) {


I wonder if this if check should be there, or if there would need to be an else case that puts a FALSE in the query, in other words: when the index search returns no hits, I would expect to get an empty result, not all hits, but I don’t know (and cannot currently test) if this happens. This is my only review remark and applies to all occurrences of this if check.

Good point, i will have to think about that.

BartChris · 2026-03-17T17:27:31Z

I wonder if this if check should be there, or if there would need to be an else case that puts a FALSE in the query, in other words: when the index search returns no hits, I would expect to get an empty result, not all hits, but I don’t know (and cannot currently test) if this happens. This is my only review remark and applies to all occurrences of this if check.

@matthias-ronge I checked the search. It works. But the if check for empty results lacks clarity. I therefor refactored the code and let BeanQuery deal with the complexities involved so we do not need the confusing if-check in the calling code.

codacy-production · 2026-04-02T15:17:38Z

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics -4 complexity · 0 duplication

Metric Results

Complexity -4

Duplication 0

View in Codacy

_{NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer}
_{TIP This summary will be updated as you push new changes. Give us feedback}

Use multiple field/value terms in ES query

02b2768

BartChris mentioned this pull request Nov 27, 2025

[3.9] Use multiple field/value terms in ES query #6783

Draft

github-advanced-security AI found potential problems Nov 27, 2025

View reviewed changes

Comment thread Kitodo/src/main/java/org/kitodo/production/services/index/IndexingService.java Fixed

This comment was marked as outdated.

Sign in to view

Fix tests

cbf2dd7

BartChris force-pushed the improve_index_requests branch from f123f51 to cbf2dd7 Compare November 28, 2025 09:30

github-advanced-security AI found potential problems Nov 28, 2025

View reviewed changes

Comment thread Kitodo/src/main/java/org/kitodo/production/services/index/IndexingService.java Fixed

BartChris force-pushed the improve_index_requests branch from f9c5f93 to de22770 Compare November 28, 2025 09:49

Fix security warning

72b1672

BartChris force-pushed the improve_index_requests branch from de22770 to 72b1672 Compare November 28, 2025 09:58

BartChris added 6 commits March 13, 2026 17:42

Merge main

8c7b4bb

Reduce coupling between database and search index

cbe065b

Fix filters in TaskService

8cc66c7

Autoclose Hibernate session

02763dc

Adapt to Java21

89d6fce

Adjust comment

33c2512

BartChris force-pushed the improve_index_requests branch from bd69284 to 8250141 Compare March 13, 2026 18:12

Further simplification

4bd2dca

BartChris force-pushed the improve_index_requests branch from 8250141 to 4bd2dca Compare March 13, 2026 18:20

matthias-ronge reviewed Mar 16, 2026

View reviewed changes

BartChris added 2 commits March 16, 2026 11:40

Refactor variables for more clarity and use filter query

1a48eee

Log search terms

13b126a

BartChris requested a review from matthias-ronge March 16, 2026 10:56

BartChris marked this pull request as ready for review March 16, 2026 10:57

matthias-ronge reviewed Mar 17, 2026

View reviewed changes

BartChris force-pushed the improve_index_requests branch from c3be130 to ca5f256 Compare March 17, 2026 17:33

Refactor for better encapsulation and clarity

68ee189

BartChris force-pushed the improve_index_requests branch from ca5f256 to 68ee189 Compare March 17, 2026 17:37

BartChris added 3 commits March 18, 2026 11:41

Merge branch 'main' into improve_index_requests

9a51840

Adapt comment

0164fc3

Merge branch 'main' into improve_index_requests

fcfd7e9

solth added the search search, filter label Apr 13, 2026

Merge branch 'main' into improve_index_requests

9279108

solth requested a review from matthias-ronge April 24, 2026 07:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use multiple field/value terms in ES query#6782

Use multiple field/value terms in ES query#6782
BartChris wants to merge 17 commits intokitodo:mainfrom
BartChris:improve_index_requests

BartChris commented Nov 27, 2025 •

edited

Loading

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

matthias-ronge Mar 16, 2026

Uh oh!

matthias-ronge Mar 16, 2026

Uh oh!

BartChris Mar 16, 2026

Uh oh!

BartChris Mar 16, 2026 •

edited

Loading

Uh oh!

matthias-ronge Mar 16, 2026

Uh oh!

BartChris Mar 16, 2026

Uh oh!

matthias-ronge left a comment

Uh oh!

matthias-ronge Mar 17, 2026

Uh oh!

BartChris Mar 17, 2026

Uh oh!

BartChris commented Mar 17, 2026

Uh oh!

codacy-production Bot commented Apr 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

BartChris commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

matthias-ronge Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

matthias-ronge Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

BartChris Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

BartChris Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

matthias-ronge Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

BartChris Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

matthias-ronge left a comment

Choose a reason for hiding this comment

Uh oh!

matthias-ronge Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

BartChris Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

BartChris commented Mar 17, 2026

Uh oh!

codacy-production Bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Up to standards ✅

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

BartChris commented Nov 27, 2025 •

edited

Loading

BartChris Mar 16, 2026 •

edited

Loading

codacy-production Bot commented Apr 2, 2026 •

edited

Loading