|
| 1 | +--- |
| 2 | +layout: post |
| 3 | +title: 'Indexing rollover with Quarkus and Hibernate Search' |
| 4 | +date: 2024-04-25 |
| 5 | +tags: hibernate search howto |
| 6 | +synopsis: 'This is the first post in the series that dives into the implementation details of the search.quarkus.io application. Are you interested in near zero-downtime reindexing? Then this one is for you!' |
| 7 | +author: markobekhta |
| 8 | +--- |
| 9 | +:hibernate-search-version: 7.0 |
| 10 | +:imagesdir: /assets/images/posts/search-indexing-rollover |
| 11 | +:hibernate-search-docs-url: https://docs.jboss.org/hibernate/search/{hibernate-search-version}/reference/en-US/html_single/ |
| 12 | +:quarkus-hibernate-search-docs-url: https://quarkus.io/guides/hibernate-search-orm-elasticsearch |
| 13 | + |
| 14 | +This is the first post in the series diving into the implementation details of the |
| 15 | +link:https://github.com/quarkusio/search.quarkus.io[application] backing the guide search of |
| 16 | +link:https://quarkus.io/guides/[quarkus.io]. |
| 17 | + |
| 18 | +Does your application need full-text search capabilities? Do you need to keep your application running |
| 19 | +and producing search results without any downtime, even when reindexing all your data? |
| 20 | +Look no further. In this post, we'll cover how you can approach this problem |
| 21 | +and solve it in practice with a few low-level APIs, provided you use Hibernate Search, |
| 22 | +be it link:{quarkus-hibernate-search-docs-url}[on top of Hibernate ORM] |
| 23 | +or https://quarkus.io/version/main/guides/hibernate-search-standalone-elasticsearch[in standalone mode]. |
| 24 | + |
| 25 | +The approach suggested in this post is based on the fact that Hibernate Search uses |
| 26 | +link:{hibernate-search-docs-url}#backend-elasticsearch-indexlayout[aliased indexes], |
| 27 | +and communicates with the actual index through a read/write alias, depending on the operation it needs to perform. |
| 28 | +For example, a search operation will be routed to a read index alias, |
| 29 | +while an indexing operation will be sent to a write index alias. |
| 30 | + |
| 31 | +image::initial-app.png[] |
| 32 | + |
| 33 | +NOTE: This approach is implemented and successfully used in our Quarkus application that backs the guides' |
| 34 | +search of https://quarkus.io/guides/[quarkus.io/guides/]. |
| 35 | +You can see the complete implementation here: |
| 36 | +link:https://github.com/quarkusio/search.quarkus.io/blob/d956b6a1341d8693fa1d6b7881f3840f48bdaacd/src/main/java/io/quarkus/search/app/indexing/Rollover.java#L44-L331[rollover implementation] |
| 37 | +and link:https://github.com/quarkusio/search.quarkus.io/blob/d956b6a1341d8693fa1d6b7881f3840f48bdaacd/src/main/java/io/quarkus/search/app/indexing/IndexingService.java#L226-L244[rollover usage]. |
| 38 | + |
| 39 | +Applications using Hibernate Search can keep their search indexes up-to-date by updating the index gradually, |
| 40 | +as the data on which the index documents are based is modified, providing a near real-time index synchronisation. |
| 41 | + |
| 42 | +On the other hand, if the search requirements allow for a delay in synchronisation |
| 43 | +or the data is updated only at certain times of day, the option of mass indexing can effectively keep the indexes up-to-date. |
| 44 | +The link:{hibernate-search-docs-url}[Hibernate Search documentation] provides more information about these approaches |
| 45 | +and other Hibernate Search capabilities. |
| 46 | + |
| 47 | +The application discussed in this post is using the mass indexing approach. |
| 48 | +This means that at certain events, e.g., when a new version of the application is deployed or a scheduled time is reached, |
| 49 | +the application has to process the documentation guides and create search index documents from them. |
| 50 | + |
| 51 | +Now, since we want our application to keep providing results to any search requests while we add/update documents to the indexes, |
| 52 | +we cannot perform a simple reindexing operation |
| 53 | +using a link:{hibernate-search-docs-url}#search-batchindex-massindexer[mass indexer], |
| 54 | +or the recently added link:{quarkus-hibernate-search-docs-url}#management[management endpoint in Quarkus], |
| 55 | +as these would drop all existing documents from the index before indexing them: |
| 56 | +search operations would not be able to match them anymore until reindexing finishes. |
| 57 | + |
| 58 | +Instead, we can create a new index with the same schema and route any write operations to it. |
| 59 | + |
| 60 | +image::write-app.png[] |
| 61 | + |
| 62 | +Since Hibernate Search does not provide the rollover feature out of the box (https://hibernate.atlassian.net/browse/HSEARCH-3499[yet]) |
| 63 | +we will need to resort to using the lower-level APIs to access the Elasticsearch client and perform the required operations ourselves. |
| 64 | +To do so, we need to follow a few simple steps: |
| 65 | + |
| 66 | +1. Get the mapping information for the index we want to reindex using the schema manager. |
| 67 | ++ |
| 68 | +[source, java] |
| 69 | +==== |
| 70 | +---- |
| 71 | +@Inject |
| 72 | +SearchMapping searchMapping; // <1> |
| 73 | +// ... |
| 74 | +
|
| 75 | +searchMapping.scope(MyIndexedEntity.class).schemaManager() // <2> |
| 76 | + .exportExpectedSchema((backendName, indexName, export) -> { // <3> |
| 77 | + var createIndexRequestBody = export.extension(ElasticsearchExtension.get()) |
| 78 | + .bodyParts().get(0); // <4> |
| 79 | + var mappings = createIndexRequestBody.getAsJsonObject("mappings"); // <5> |
| 80 | + var settings =createIndexRequestBody.getAsJsonObject("settings"); // <6> |
| 81 | + }); |
| 82 | +---- |
| 83 | +1. Inject `SearchMapping` somewhere in your app so that we can use it to access a schema manager. |
| 84 | +2. Get a schema manager for the indexed entity we are interested in (`MyIndexedEntity`). |
| 85 | +If all entities should be targeted, then `Object.class` can be used to create the scope. |
| 86 | +3. Use the export schema API to access the mapping information. |
| 87 | +4. Use the extension to get access to the Elasticsearch-specific `.bodyParts()` method that returns |
| 88 | +a JSON representing the JSON HTTP body needed to create the indexes. |
| 89 | +5. Get the mapping information for the particular index. |
| 90 | +6. Get the settings for the particular index. |
| 91 | +==== |
| 92 | ++ |
| 93 | +2. Get the reference to the Elasticsearch client, so we can perform API calls to the search backend cluster: |
| 94 | ++ |
| 95 | +[source, java] |
| 96 | +==== |
| 97 | +---- |
| 98 | +@Inject |
| 99 | +SearchMapping searchMapping; // <1> |
| 100 | +// ... |
| 101 | +RestClient client = searchMapping.backend() // <2> |
| 102 | + .unwrap(ElasticsearchBackend.class) // <3> |
| 103 | + .client(RestClient.class); // <4> |
| 104 | +---- |
| 105 | +1. Inject `SearchMapping` somewhere in your app so that we can use it to access a schema manager. |
| 106 | +2. Access the backend from a search mapping instance. |
| 107 | +3. Unwrap the backend to the `ElasticsearchBackend`, so that we can access backend-specific APIs. |
| 108 | +4. Get a reference to the Elasticsearch's rest client. |
| 109 | +==== |
| 110 | ++ |
| 111 | +3. Create a new index using the OpenSearch/Elasticsearch rollover API |
| 112 | +that would allow us to keep using the existing index for read operations, |
| 113 | +while write operations will be sent to the new index: |
| 114 | ++ |
| 115 | +[source, java] |
| 116 | +==== |
| 117 | +---- |
| 118 | +@Inject |
| 119 | +SearchMapping searchMapping; // <1> |
| 120 | +// ... |
| 121 | +
|
| 122 | +SearchIndexedEntity<?> entity = searchMapping.indexedEntity(MyIndexedEntity.class); |
| 123 | +var index = entity.indexManager().unwrap(ElasticsearchIndexManager.class).descriptor(); // <2> |
| 124 | +
|
| 125 | +var request = new Request("POST", "/" + index.writeName() + "/_rollover"); // <3> |
| 126 | +var body = new JsonObject(); |
| 127 | +body.add("mappings", mappings); |
| 128 | +body.add("settings", settings); |
| 129 | +body.add("aliases", new JsonObject()); // <4> |
| 130 | +request.setEntity(new StringEntity(gson.toJson(body), ContentType.APPLICATION_JSON)); |
| 131 | +
|
| 132 | +var response = client.performRequest(request); // <5> |
| 133 | +//... |
| 134 | +---- |
| 135 | +1. Inject `SearchMapping` somewhere in your app so that we can use it to access a schema manager. |
| 136 | +2. Get the index descriptor to get the aliases from it. |
| 137 | +3. Start building the rollover request body using the write index alias from the index descriptor. |
| 138 | +4. Note that we are including an empty "aliases" so that the aliases are not copied over to the new index, |
| 139 | +except for the write alias (which is implicitly updated since the rollover request is targeting it directly). |
| 140 | +We don't want the read alias to start pointing to the new index immediately. |
| 141 | +5. Perform the rollover API request using the Elasticsearch REST client obtained in the previous step. |
| 142 | +==== |
| 143 | + |
| 144 | +With this successfully completed, indexes are in the state we wanted: |
| 145 | + |
| 146 | +image::write-app.png[] |
| 147 | + |
| 148 | +We can start populating our write index without affecting search requests. |
| 149 | + |
| 150 | +Once we are done with indexing, we can either commit or rollback depending on the results: |
| 151 | + |
| 152 | +image::after-indexing.png[] |
| 153 | + |
| 154 | +Committing the index rollover means that we are happy with the results and ready to switch to the new index |
| 155 | +for both reading and writing operations while removing the old one. To do that, we need to send a request to the cluster: |
| 156 | + |
| 157 | +[source, java] |
| 158 | +==== |
| 159 | +---- |
| 160 | +var client = ... <1> |
| 161 | +
|
| 162 | +var request = new Request("POST", "_aliases"); // <2> |
| 163 | +request.setEntity(new StringEntity(""" |
| 164 | + { |
| 165 | + "actions": [ |
| 166 | + { |
| 167 | + "add": { // <3> |
| 168 | + "index": "%s", |
| 169 | + "alias": "%s", |
| 170 | + "is_write_index": false |
| 171 | + }, |
| 172 | + "remove_index": { // <4> |
| 173 | + "index": "%s" |
| 174 | + } |
| 175 | + } |
| 176 | + ] |
| 177 | + } |
| 178 | + """.formatted( newIndexName, readAliasName, oldIndexName ) // <5> |
| 179 | + , ContentType.APPLICATION_JSON)); |
| 180 | +
|
| 181 | +var response = client.performRequest(request); // <5> |
| 182 | +//... |
| 183 | +---- |
| 184 | +1. Get access to the Elasticsearch REST client as described above. |
| 185 | +2. Start creating an `_aliases` API request. |
| 186 | +3. Add an action to update the index aliases to use the new index for both read and write operations. |
| 187 | +Here, we must make the read alias point to the new index. |
| 188 | +4. Add an action to remove the old index. |
| 189 | +5. The names of the new/old index can be retrieved from the response of the initial `_rollover` API request, |
| 190 | +while the aliases can be retrieved from the index descriptor. |
| 191 | +==== |
| 192 | + |
| 193 | +Otherwise, if we have encountered an error or decided for any other reason to stop the rollover, we can roll back to using |
| 194 | +the initial index: |
| 195 | + |
| 196 | +[source, java] |
| 197 | +==== |
| 198 | +---- |
| 199 | +var client = ... <1> |
| 200 | +
|
| 201 | +var request = new Request("POST", "_aliases"); // <2> |
| 202 | +request.setEntity(new StringEntity(""" |
| 203 | + { |
| 204 | + "actions": [ |
| 205 | + { |
| 206 | + "add": { // <3> |
| 207 | + "index": "%s", |
| 208 | + "alias": "%s", |
| 209 | + "is_write_index": true |
| 210 | + }, |
| 211 | + "remove_index": { // <4> |
| 212 | + "index": "%s" |
| 213 | + } |
| 214 | + } |
| 215 | + ] |
| 216 | + } |
| 217 | + """.formatted( oldIndexName, writeAliasName, newIndexName ) // <5> |
| 218 | + , ContentType.APPLICATION_JSON)); |
| 219 | +
|
| 220 | +var response = client.performRequest(request); // <5> |
| 221 | +//... |
| 222 | +---- |
| 223 | +1. Get access to the Elasticsearch REST client as described above. |
| 224 | +2. Start creating an `_aliases` API request. |
| 225 | +3. Add an action to update the index aliases to use the old index for both read and write operations. |
| 226 | +Here, we must make the write alias point back to the old index. |
| 227 | +4. Add an action to remove the new index. |
| 228 | +5. The names of the new/old index can be retrieved from the response of the initial `_rollover` API request, |
| 229 | +while the aliases can be retrieved from the index descriptor. |
| 230 | +==== |
| 231 | + |
| 232 | +NOTE: Keep in mind that in case of a rollback, your initial index may be out of sync if any write operations were performed |
| 233 | +while the write alias was pointing to the new index. |
| 234 | + |
| 235 | +With this knowledge, we can organize the rollover process as follows: |
| 236 | +[source, java] |
| 237 | +==== |
| 238 | +---- |
| 239 | +try (Rollover rollover = Rollover.start(searchMapping)) { |
| 240 | + // Perform the indexing operations ... |
| 241 | + rollover.commit(); |
| 242 | +} |
| 243 | +---- |
| 244 | +==== |
| 245 | + |
| 246 | +Where the `Rollover` class will look as follows: |
| 247 | + |
| 248 | +[source, java] |
| 249 | +==== |
| 250 | +---- |
| 251 | +class Rollover implements Closeable { |
| 252 | + public static Rollover start(SearchMapping searchMapping) { |
| 253 | + // initiate the rollover process by sending the _rollover request ... |
| 254 | + // ... |
| 255 | + return new Rollover( client, rolloverResponse ); // <1> |
| 256 | + } |
| 257 | +
|
| 258 | + @Override |
| 259 | + public void close() { |
| 260 | + if ( !done ) { // <2> |
| 261 | + rollback(); |
| 262 | + } |
| 263 | + } |
| 264 | +
|
| 265 | + public void commit() { |
| 266 | + // send the `_aliases` request to switch to the *new* index |
| 267 | + // ... |
| 268 | + done = true; |
| 269 | + } |
| 270 | +
|
| 271 | + public void rollback() { |
| 272 | + // send the `_aliases` request to switch to the *old* index |
| 273 | + // ... |
| 274 | + done = true; |
| 275 | + } |
| 276 | +} |
| 277 | +---- |
| 278 | +1. Keep the reference to the Elasticsearch REST client to perform API calls. |
| 279 | +2. If we haven't successfully committed the rollover, it'll be rolled back on close. |
| 280 | +==== |
| 281 | + |
| 282 | +Once again, for a complete working example of this rollover implementation, check out the |
| 283 | +link:https://github.com/quarkusio/search.quarkus.io[search.quarkus.io on GitHub]. |
| 284 | + |
| 285 | +If you find this feature useful and would like to have it built-in into your Hibernate Search and Quarkus apps |
| 286 | +feel free to reach out to us on the https://hibernate.atlassian.net/browse/HSEARCH-3499[pending feature requests] |
| 287 | +to discuss your ideas and suggestions. |
| 288 | + |
| 289 | +Stay tuned for more details in the coming weeks as we publish more blog posts |
| 290 | +diving into other interesting implementation aspects of this application. |
| 291 | +Happy searching and rolling over! |
0 commit comments