Skip to content

Commit dae1767

Browse files
add sharding guide
1 parent e1982e5 commit dae1767

File tree

2 files changed

+132
-0
lines changed

2 files changed

+132
-0
lines changed

config/sidebar-learn.json

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -272,6 +272,11 @@
272272
"label": "Using multi-search to perform a federated search",
273273
"slug": "performing_federated_search"
274274
},
275+
{
276+
"source": "learn/multi_search/implement_sharding.mdx",
277+
"label": "Implement sharding with remote federated search",
278+
"slug": "implement_sharding"
279+
},
275280
{
276281
"source": "learn/multi_search/multi_search_vs_federated_search.mdx",
277282
"label": "Differences between multi-search and federated search",
Lines changed: 127 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,127 @@
1+
---
2+
title: Implement sharding with remote federated search — Meilisearch documentation
3+
description: This guide walks you through implemnting a sharding strategy by activating the `/network` route, configuring the network object, and performing remote federated searches.
4+
---
5+
6+
# Implement sharding with remote federated search
7+
8+
Sharding is the process of splitting an index containing many documents into multiple smaller indexes, often called shards. This horizontal scaling technique is useful when handling large databases. In Meilisearch, the best way to implement a sharding strategy is to use remote federated search.
9+
10+
This guide walks you through activating the `/network` route, configuring the network object, and performing remote federated searches.
11+
12+
The `/network` route is not currently available on Meilisearch Cloud.
13+
14+
<Capsule intent="tip" title="Configuring multiple instances">
15+
To minimize issues and limit unexpected behavior, instance, network, and index configuration should be identical for all shards. This guide describes the individual steps you must take on a single instance and assumes you will replicate them across all instances.
16+
</Capsule>
17+
18+
## Prerequisites
19+
20+
- Multiple Meilisearch projects (instances) running Meilisearch >=v1.13
21+
22+
## Activate the `/network` endpoint
23+
24+
First, use the `/experimental-features` route to enable `network`:
25+
26+
```sh
27+
curl \
28+
-X PATCH 'MEILISEARCH_URL/experimental-features/' \
29+
-H 'Content-Type: application/json' \
30+
--data-binary '{
31+
"network": true
32+
}'
33+
```
34+
35+
Meilisearch should respond immediately, confirming the route is now accessible.
36+
37+
## Configuring the network object
38+
39+
The network object consists of the following fields:
40+
41+
- `remotes`: a list with the required information to access each remote instance
42+
- `self`: specifies which of configured remotes corresponds to the current instance
43+
44+
### Setting up remote instances
45+
46+
Next, configure the `remotes` field of the network object:
47+
48+
```sh
49+
curl \
50+
-X PATCH 'MEILISEARCH_URL/network' \
51+
-H 'Content-Type: application/json' \
52+
--data-binary '{
53+
"remotes": {
54+
"REMOTE_NAME_1": {
55+
"url": "INSTANCE_URL_1",
56+
"searchApiKey": "SEARCH_API_KEY_1"
57+
},
58+
"REMOTE_NAME_2": {
59+
"url": "INSTANCE_URL_2",
60+
"searchApiKey": "SEARCH_API_KEY_2"
61+
},
62+
"REMOTE_NAME_3": {
63+
"url": "INSTANCE_URL_3",
64+
"searchApiKey": "SEARCH_API_KEY_3"
65+
},
66+
67+
}
68+
}'
69+
```
70+
71+
Each object should consist of the name of each instance, associated with its URL and an API key with search permission.
72+
73+
Configure the entire set of remote instances in your sharded database, making sure to send the same remotes to each instance.
74+
75+
### Specify the name of the current instance
76+
77+
Now all instances share the same list of remotes, set the `self` field to specify which of the remotes corresponds to the current instance:
78+
79+
```sh
80+
curl \
81+
-X PATCH 'MEILISEARCH_URL/network' \
82+
-H 'Content-Type: application/json' \
83+
--data-binary '{
84+
"self": "REMOTE_NAME_1"
85+
}
86+
```
87+
88+
Meilisearch processes searches on the remote that corresponds to `self` locally instead of making a remote request.
89+
90+
### Adding or removing an instance
91+
92+
Changing the topology of the network involves moving some documents from an instance to another, depending on your hashing scheme.
93+
94+
As Meilisearch does not provide atomicity across multiple instances, you will need to either:
95+
96+
1. accept search downtime while migrating documents
97+
2. accept some documents will not appear in search results during the migration
98+
3. accept some duplicate documents may appear in search results during the migration
99+
100+
#### Reducing downtime
101+
102+
If your disk space allows, you can reduce the downtime by applying the following algorithm:
103+
104+
1. Create a new temporary index in each remote instance
105+
2. Compute the new instance for each document
106+
3. Send the documents to the temporary index of their new instance
107+
4. Once Meilisearch has copied all documents to their instance of destination, swap the new index with the previously used index
108+
5. Delete the temporary index after the swap
109+
6. Update network configuration and search queries across all instances
110+
111+
## Create indexes and add documents
112+
113+
Create the same empty indexes with the same settings on all instances. Keeping the settings and indexes in sync is important to avoid errors and unexpected behavior, though not strictly required.
114+
115+
Distribute your documents across all instances. Do not send same document **to multiple instances**, as this may lead to duplicate search results. Similarly, you should ensure all future versions of a document are sent to the same instance. Meilisearch recommends you hash their primary key using [rendezvous hashing](https://en.wikipedia.org/wiki/Rendezvous_hashing).
116+
117+
### Updating index settings
118+
119+
Changing settings in a sharded database is not fundamentally different from changing settings on a single Meilisearch instance. If the update enables a feature, such as setting filterable attributes, wait until all changes have been processed before using the `filter` search parameter in a query. Likewise, if an update disables a feature, first remove it from your search requests, then update your settings.
120+
121+
## Perform a search
122+
123+
Send your federated search request containing one query per instance:
124+
125+
<CodeSamples id="multi_search_remote_federated_1" />
126+
127+
If all instances share the same network configuration, you can send the search request to any instance. Having `"remote": "ms-00"` appear in the list of queries on the instance of that name will not cause an actual proxy search thanks to `network.self`.

0 commit comments

Comments
 (0)