Skip to content

Commit a7f909c

Browse files
committed
Post: quarkus.search.io series - rolling over
1 parent 0b9d4f8 commit a7f909c

File tree

5 files changed

+298
-0
lines changed

5 files changed

+298
-0
lines changed

_data/authors.yaml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -541,3 +541,10 @@ iyanev:
541541
emailhash: "05c3ef31aeba2d12e2eb26282752e654"
542542
job_title: "Senior Software Engineer"
543543
twitter: "iqnev"
544+
markobekhta:
545+
name: "Marko Bekhta"
546+
547+
emailhash: "2934f00ba9190bc06cf03fde5b50c61d"
548+
job_title: "Engineer (Software)"
549+
twitter: "that_java_guy"
550+
bio: "Software Engineer at Red Hat and Hibernate team member."
Lines changed: 291 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,291 @@
1+
---
2+
layout: post
3+
title: 'Indexing rollover with Quarkus and Hibernate Search'
4+
date: 2024-04-25
5+
tags: hibernate search howto
6+
synopsis: 'This is the first post in the series that dives into the implementation details of the search.quarkus.io application. Are you interested in near zero-downtime reindexing? Then this one is for you!'
7+
author: markobekhta
8+
---
9+
:hibernate-search-version: 7.0
10+
:imagesdir: /assets/images/posts/search-indexing-rollover
11+
:hibernate-search-docs-url: https://docs.jboss.org/hibernate/search/{hibernate-search-version}/reference/en-US/html_single/
12+
:quarkus-hibernate-search-docs-url: https://quarkus.io/guides/hibernate-search-orm-elasticsearch
13+
14+
This is the first post in the series diving into the implementation details of the
15+
link:https://github.com/quarkusio/search.quarkus.io[application] backing the guide search of
16+
link:https://quarkus.io/guides/[quarkus.io].
17+
18+
Does your application need full-text search capabilities? Do you need to keep your application running
19+
and producing search results without any downtime, even when reindexing all your data?
20+
Look no further. In this post, we'll cover how you can approach this problem
21+
and solve it in practice with a few low-level APIs, provided you use Hibernate Search,
22+
be it link:{quarkus-hibernate-search-docs-url}[on top of Hibernate ORM]
23+
or https://quarkus.io/version/main/guides/hibernate-search-standalone-elasticsearch[in standalone mode].
24+
25+
The approach suggested in this post is based on the fact that Hibernate Search uses
26+
link:{hibernate-search-docs-url}#backend-elasticsearch-indexlayout[aliased indexes],
27+
and communicates with the actual index through a read/write alias, depending on the operation it needs to perform.
28+
For example, a search operation will be routed to a read index alias,
29+
while an indexing operation will be sent to a write index alias.
30+
31+
image::initial-app.png[]
32+
33+
NOTE: This approach is implemented and successfully used in our Quarkus application that backs the guides'
34+
search of https://quarkus.io/guides/[quarkus.io/guides/].
35+
You can see the complete implementation here:
36+
link:https://github.com/quarkusio/search.quarkus.io/blob/d956b6a1341d8693fa1d6b7881f3840f48bdaacd/src/main/java/io/quarkus/search/app/indexing/Rollover.java#L44-L331[rollover implementation]
37+
and link:https://github.com/quarkusio/search.quarkus.io/blob/d956b6a1341d8693fa1d6b7881f3840f48bdaacd/src/main/java/io/quarkus/search/app/indexing/IndexingService.java#L226-L244[rollover usage].
38+
39+
Applications using Hibernate Search can keep their search indexes up-to-date by updating the index gradually,
40+
as the data on which the index documents are based is modified, providing a near real-time index synchronisation.
41+
42+
On the other hand, if the search requirements allow for a delay in synchronisation
43+
or the data is updated only at certain times of day, the option of mass indexing can effectively keep the indexes up-to-date.
44+
The link:{hibernate-search-docs-url}[Hibernate Search documentation] provides more information about these approaches
45+
and other Hibernate Search capabilities.
46+
47+
The application discussed in this post is using the mass indexing approach.
48+
This means that at certain events, e.g., when a new version of the application is deployed or a scheduled time is reached,
49+
the application has to process the documentation guides and create search index documents from them.
50+
51+
Now, since we want our application to keep providing results to any search requests while we add/update documents to the indexes,
52+
we cannot perform a simple reindexing operation
53+
using a link:{hibernate-search-docs-url}#search-batchindex-massindexer[mass indexer],
54+
or the recently added link:{quarkus-hibernate-search-docs-url}#management[management endpoint in Quarkus],
55+
as these would drop all existing documents from the index before indexing them:
56+
search operations would not be able to match them anymore until reindexing finishes.
57+
58+
Instead, we can create a new index with the same schema and route any write operations to it.
59+
60+
image::write-app.png[]
61+
62+
Since Hibernate Search does not provide the rollover feature out of the box (https://hibernate.atlassian.net/browse/HSEARCH-3499[yet])
63+
we will need to resort to using the lower-level APIs to access the Elasticsearch client and perform the required operations ourselves.
64+
To do so, we need to follow a few simple steps:
65+
66+
1. Get the mapping information for the index we want to reindex using the schema manager.
67+
+
68+
[source, java]
69+
====
70+
----
71+
@Inject
72+
SearchMapping searchMapping; // <1>
73+
// ...
74+
75+
searchMapping.scope(MyIndexedEntity.class).schemaManager() // <2>
76+
.exportExpectedSchema((backendName, indexName, export) -> { // <3>
77+
var createIndexRequestBody = export.extension(ElasticsearchExtension.get())
78+
.bodyParts().get(0); // <4>
79+
var mappings = createIndexRequestBody.getAsJsonObject("mappings"); // <5>
80+
var settings =createIndexRequestBody.getAsJsonObject("settings"); // <6>
81+
});
82+
----
83+
1. Inject `SearchMapping` somewhere in your app so that we can use it to access a schema manager.
84+
2. Get a schema manager for the indexed entity we are interested in (`MyIndexedEntity`).
85+
If all entities should be targeted, then `Object.class` can be used to create the scope.
86+
3. Use the export schema API to access the mapping information.
87+
4. Use the extension to get access to the Elasticsearch-specific `.bodyParts()` method that returns
88+
a JSON representing the JSON HTTP body needed to create the indexes.
89+
5. Get the mapping information for the particular index.
90+
6. Get the settings for the particular index.
91+
====
92+
+
93+
2. Get the reference to the Elasticsearch client, so we can perform API calls to the search backend cluster:
94+
+
95+
[source, java]
96+
====
97+
----
98+
@Inject
99+
SearchMapping searchMapping; // <1>
100+
// ...
101+
RestClient client = searchMapping.backend() // <2>
102+
.unwrap(ElasticsearchBackend.class) // <3>
103+
.client(RestClient.class); // <4>
104+
----
105+
1. Inject `SearchMapping` somewhere in your app so that we can use it to access a schema manager.
106+
2. Access the backend from a search mapping instance.
107+
3. Unwrap the backend to the `ElasticsearchBackend`, so that we can access backend-specific APIs.
108+
4. Get a reference to the Elasticsearch's rest client.
109+
====
110+
+
111+
3. Create a new index using the OpenSearch/Elasticsearch rollover API
112+
that would allow us to keep using the existing index for read operations,
113+
while write operations will be sent to the new index:
114+
+
115+
[source, java]
116+
====
117+
----
118+
@Inject
119+
SearchMapping searchMapping; // <1>
120+
// ...
121+
122+
SearchIndexedEntity<?> entity = searchMapping.indexedEntity(MyIndexedEntity.class);
123+
var index = entity.indexManager().unwrap(ElasticsearchIndexManager.class).descriptor(); // <2>
124+
125+
var request = new Request("POST", "/" + index.writeName() + "/_rollover"); // <3>
126+
var body = new JsonObject();
127+
body.add("mappings", mappings);
128+
body.add("settings", settings);
129+
body.add("aliases", new JsonObject()); // <4>
130+
request.setEntity(new StringEntity(gson.toJson(body), ContentType.APPLICATION_JSON));
131+
132+
var response = client.performRequest(request); // <5>
133+
//...
134+
----
135+
1. Inject `SearchMapping` somewhere in your app so that we can use it to access a schema manager.
136+
2. Get the index descriptor to get the aliases from it.
137+
3. Start building the rollover request body using the write index alias from the index descriptor.
138+
4. Note that we are including an empty "aliases" so that the aliases are not copied over to the new index,
139+
except for the write alias (which is implicitly updated since the rollover request is targeting it directly).
140+
We don't want the read alias to start pointing to the new index immediately.
141+
5. Perform the rollover API request using the Elasticsearch REST client obtained in the previous step.
142+
====
143+
144+
With this successfully completed, indexes are in the state we wanted:
145+
146+
image::write-app.png[]
147+
148+
We can start populating our write index without affecting search requests.
149+
150+
Once we are done with indexing, we can either commit or rollback depending on the results:
151+
152+
image::after-indexing.png[]
153+
154+
Committing the index rollover means that we are happy with the results and ready to switch to the new index
155+
for both reading and writing operations while removing the old one. To do that, we need to send a request to the cluster:
156+
157+
[source, java]
158+
====
159+
----
160+
var client = ... <1>
161+
162+
var request = new Request("POST", "_aliases"); // <2>
163+
request.setEntity(new StringEntity("""
164+
{
165+
"actions": [
166+
{
167+
"add": { // <3>
168+
"index": "%s",
169+
"alias": "%s",
170+
"is_write_index": false
171+
},
172+
"remove_index": { // <4>
173+
"index": "%s"
174+
}
175+
}
176+
]
177+
}
178+
""".formatted( newIndexName, readAliasName, oldIndexName ) // <5>
179+
, ContentType.APPLICATION_JSON));
180+
181+
var response = client.performRequest(request); // <5>
182+
//...
183+
----
184+
1. Get access to the Elasticsearch REST client as described above.
185+
2. Start creating an `_aliases` API request.
186+
3. Add an action to update the index aliases to use the new index for both read and write operations.
187+
Here, we must make the read alias point to the new index.
188+
4. Add an action to remove the old index.
189+
5. The names of the new/old index can be retrieved from the response of the initial `_rollover` API request,
190+
while the aliases can be retrieved from the index descriptor.
191+
====
192+
193+
Otherwise, if we have encountered an error or decided for any other reason to stop the rollover, we can roll back to using
194+
the initial index:
195+
196+
[source, java]
197+
====
198+
----
199+
var client = ... <1>
200+
201+
var request = new Request("POST", "_aliases"); // <2>
202+
request.setEntity(new StringEntity("""
203+
{
204+
"actions": [
205+
{
206+
"add": { // <3>
207+
"index": "%s",
208+
"alias": "%s",
209+
"is_write_index": true
210+
},
211+
"remove_index": { // <4>
212+
"index": "%s"
213+
}
214+
}
215+
]
216+
}
217+
""".formatted( oldIndexName, writeAliasName, newIndexName ) // <5>
218+
, ContentType.APPLICATION_JSON));
219+
220+
var response = client.performRequest(request); // <5>
221+
//...
222+
----
223+
1. Get access to the Elasticsearch REST client as described above.
224+
2. Start creating an `_aliases` API request.
225+
3. Add an action to update the index aliases to use the old index for both read and write operations.
226+
Here, we must make the write alias point back to the old index.
227+
4. Add an action to remove the new index.
228+
5. The names of the new/old index can be retrieved from the response of the initial `_rollover` API request,
229+
while the aliases can be retrieved from the index descriptor.
230+
====
231+
232+
NOTE: Keep in mind that in case of a rollback, your initial index may be out of sync if any write operations were performed
233+
while the write alias was pointing to the new index.
234+
235+
With this knowledge, we can organize the rollover process as follows:
236+
[source, java]
237+
====
238+
----
239+
try (Rollover rollover = Rollover.start(searchMapping)) {
240+
// Perform the indexing operations ...
241+
rollover.commit();
242+
}
243+
----
244+
====
245+
246+
Where the `Rollover` class will look as follows:
247+
248+
[source, java]
249+
====
250+
----
251+
class Rollover implements Closeable {
252+
public static Rollover start(SearchMapping searchMapping) {
253+
// initiate the rollover process by sending the _rollover request ...
254+
// ...
255+
return new Rollover( client, rolloverResponse ); // <1>
256+
}
257+
258+
@Override
259+
public void close() {
260+
if ( !done ) { // <2>
261+
rollback();
262+
}
263+
}
264+
265+
public void commit() {
266+
// send the `_aliases` request to switch to the *new* index
267+
// ...
268+
done = true;
269+
}
270+
271+
public void rollback() {
272+
// send the `_aliases` request to switch to the *old* index
273+
// ...
274+
done = true;
275+
}
276+
}
277+
----
278+
1. Keep the reference to the Elasticsearch REST client to perform API calls.
279+
2. If we haven't successfully committed the rollover, it'll be rolled back on close.
280+
====
281+
282+
Once again, for a complete working example of this rollover implementation, check out the
283+
link:https://github.com/quarkusio/search.quarkus.io[search.quarkus.io on GitHub].
284+
285+
If you find this feature useful and would like to have it built-in into your Hibernate Search and Quarkus apps
286+
feel free to reach out to us on the https://hibernate.atlassian.net/browse/HSEARCH-3499[pending feature requests]
287+
to discuss your ideas and suggestions.
288+
289+
Stay tuned for more details in the coming weeks as we publish more blog posts
290+
diving into other interesting implementation aspects of this application.
291+
Happy searching and rolling over!
596 KB
Loading
150 KB
Loading
193 KB
Loading

0 commit comments

Comments
 (0)