Skip to content

Commit 991e87a

Browse files
v1.14: Experimental feature: Composite embedders (#3210)
1 parent b04f242 commit 991e87a

File tree

3 files changed

+68
-19
lines changed

3 files changed

+68
-19
lines changed

learn/resources/experimental_features_overview.mdx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,3 +54,4 @@ Activating or deactivating experimental features this way does not require you t
5454
| [Edit documents with function](/reference/api/documents#update-documents-with-function) | Use a RHAI function to edit documents directly in the Meilisearch database | API route |
5555
| [`/network` route](/reference/api/network) | Enable `/network` route | API route |
5656
| [Dumpless upgrade](/learn/self_hosted/configure_meilisearch_at_launch#dumpless-upgrade) | Upgrade Meilisearch without generating a dump | API route |
57+
| [Composite embedders](/reference/api/settings#composite-embedders) | Enable composite embedders | API route |

reference/api/settings.mdx

Lines changed: 63 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -2423,20 +2423,22 @@ The embedders object may contain up to 256 embedder objects. Each embedder objec
24232423

24242424
These embedder objects may contain the following fields:
24252425

2426-
| Name | Type | Default Value | Description |
2427-
| :---------------------| :---------------| :-----------------------------------------------------------------------| :-------------------------------------------------------------------------------------------------------------------------------------------------------------|
2428-
| **`source`** | String | Empty | The third-party tool that will generate embeddings from documents. Must be `openAi`, `huggingFace`, `ollama`, `rest`, or `userProvided` |
2429-
| **`url`** | String | `http://localhost:11434/api/embeddings` | The URL Meilisearch contacts when querying the embedder |
2430-
| **`apiKey`** | String | Empty | Authentication token Meilisearch should send with each request to the embedder. If not present, Meilisearch will attempt to read it from environment variables |
2431-
| **`model`** | String | Empty | The model your embedder uses when generating vectors |
2432-
| **`documentTemplate`** | String | `{% for field in fields %} {% if field.is_searchable and not field.value == nil %}{{ field.name }}: {{ field.value }} {% endif %} {% endfor %}` | Template defining the data Meilisearch sends to the embedder |
2433-
| **`documentTemplateMaxBytes`** | Integer | `400` | Maximum allowed size of rendered document template |
2434-
| **`dimensions`** | Integer | Empty | Number of dimensions in the chosen model. If not supplied, Meilisearch tries to infer this value |
2435-
| **`revision`** | String | Empty | Model revision hash |
2436-
| **`distribution`** | Object | Empty | Describes the natural distribution of search results. Must contain two fields, `mean` and `sigma`, each containing a numeric value between `0` and `1` |
2437-
| **`request`** | Object | Empty | A JSON value representing the request Meilisearch makes to the remote embedder |
2438-
| **`response`** | Object | Empty | A JSON value representing the request Meilisearch expects from the remote embedder |
2439-
| **`binaryQuantized`** | Boolean | Empty | Once set to `true`, irreversibly converts all vector dimensions to 1-bit values |
2426+
| Name | Type | Default Value | Description |
2427+
| ------------------------------ | ------- | ----------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------- |
2428+
| **`source`** | String | Empty | The third-party tool that will generate embeddings from documents. Must be `openAi`, `huggingFace`, `ollama`, `rest`, or `userProvided` |
2429+
| **`url`** | String | `http://localhost:11434/api/embeddings` | The URL Meilisearch contacts when querying the embedder |
2430+
| **`apiKey`** | String | Empty | Authentication token Meilisearch should send with each request to the embedder. If not present, Meilisearch will attempt to read it from environment variables |
2431+
| **`model`** | String | Empty | The model your embedder uses when generating vectors |
2432+
| **`documentTemplate`** | String | `{% for field in fields %} {% if field.is_searchable and not field.value == nil %}{{ field.name }}: {{ field.value }} {% endif %} {% endfor %}` | Template defining the data Meilisearch sends to the embedder |
2433+
| **`documentTemplateMaxBytes`** | Integer | `400` | Maximum allowed size of rendered document template |
2434+
| **`dimensions`** | Integer | Empty | Number of dimensions in the chosen model. If not supplied, Meilisearch tries to infer this value |
2435+
| **`revision`** | String | Empty | Model revision hash |
2436+
| **`distribution`** | Object | Empty | Describes the natural distribution of search results. Must contain two fields, `mean` and `sigma`, each containing a numeric value between `0` and `1` |
2437+
| **`request`** | Object | Empty | A JSON value representing the request Meilisearch makes to the remote embedder |
2438+
| **`response`** | Object | Empty | A JSON value representing the request Meilisearch expects from the remote embedder |
2439+
| **`binaryQuantized`** | Boolean | Empty | Once set to `true`, irreversibly converts all vector dimensions to 1-bit values |
2440+
| **`indexingEmbedder`** | Object | Empty | Configures embedder to vectorize documents during indexing |
2441+
| **`searchEmbedder`** | Object | Empty | Configures embedder to vectorize search queries |
24402442

24412443
### Get embedder settings
24422444

@@ -2500,7 +2502,9 @@ Partially update the embedder settings for an index. When this setting is update
25002502
"request": { },
25012503
"response": { },
25022504
"headers": { },
2503-
"binaryQuantized": <Boolean>
2505+
"binaryQuantized": <Boolean>,
2506+
"indexingEmbedder": { },
2507+
"searchEmbedder": { }
25042508
}
25052509
}
25062510
```
@@ -2509,18 +2513,39 @@ Set an embedder to `null` to remove it from the embedders list.
25092513

25102514
##### `source`
25112515

2512-
Use `source` to configure an embedder's source. The following embedders can auto-generate vectors for documents and queries:
2516+
Use `source` to configure an embedder's source. The source corresponds to a service that generates embeddings from your documents.
25132517

2518+
Meilisearch supports the following sources:
25142519
- `openAi`
25152520
- `huggingFace`
25162521
- `ollama`
2522+
- `rest`
2523+
- `userProvided`
2524+
- `composite` <NoticeTag type="experimental" label="experimental" />
25172525

2518-
Additionally, use `rest` to auto-generate embeddings with any embedder offering a REST API.
2526+
`rest` is a generic source compatible with any embeddings provider offering a REST API.
25192527

2520-
You may also configure a `userProvided` embedder. In this case, you must manually include vector data in your documents' `_vectors` field. You must also manually generate vectors for search queries.
2528+
Use `userProvided` when you want to generate embeddings manually. In this case, you must include vector data in your documents' `_vectors` field. You must also generate vectors for search queries.
25212529

25222530
This field is mandatory.
25232531

2532+
###### Composite embedders <NoticeTag type="experimental" label="experimental" />
2533+
2534+
Choose `composite` to use one embedder during indexing time, and another embedder at search time. Must be used together with [`indexingEmbedder` and `searchEmbedder`](#indexingembedder-and-searchembedder).
2535+
2536+
<Capsule intent="note" title="Activating composite embedders">
2537+
This is an experimental feature. Use the experimental features endpoint to activate it:
2538+
2539+
```sh
2540+
curl \
2541+
-X PATCH 'MEILISEARCH_URL/experimental-features/' \
2542+
-H 'Content-Type: application/json' \
2543+
--data-binary '{
2544+
"compositeEmbedders": true
2545+
}'
2546+
```
2547+
</Capsule>
2548+
25242549
##### `url`
25252550

25262551
Meilisearch queries `url` to generate vector embeddings for queries and documents. `url` must point to a REST-compatible embedder. You may also use `url` to work with proxies, such as when targeting `openAi` from behind a proxy.
@@ -2586,7 +2611,6 @@ This field is incompatible with `userProvided` embedders.
25862611

25872612
This field is optional for all other embedders.
25882613

2589-
25902614
##### `dimensions`
25912615

25922616
Number of dimensions in the chosen model. If not supplied, Meilisearch tries to infer this value.
@@ -2738,6 +2762,26 @@ This option can be useful when working with large Meilisearch projects. Consider
27382762
**Activating `binaryQuantized` is irreversible.** Once enabled, Meilisearch converts all vectors and discards all vector data that does fit within 1-bit. The only way to recover the vectors' original values is to re-vectorize the whole index in a new embedder.
27392763
</Capsule>
27402764

2765+
##### `indexingEmbedder` and `searchEmbedder` <NoticeTag type="experimental" label="experimental" />
2766+
2767+
When using a [composite embedder](#composite-embedders), configure separate embedders Meilisearch should use when vectorizing documents and search queries.
2768+
2769+
`indexingEmbedder` often benefits from the higher bandwidth and speed of remote providers so it can vectorize large batches of documents quickly. `searchEmbedder` may often benefits from the lower latency of processing queries locally.
2770+
2771+
Both fields must be an object and accept the same fields as a regular embedder, with the following exceptions:
2772+
2773+
- `indexingEmbedder` and `searchEmbedder` must use the same model for generating embeddings
2774+
- `indexingEmbedder` and `searchEmbedder` must have identical `dimension`s and `pooling` methods
2775+
- `source` is mandatory for both `indexingEmbedder` and `searchEmbedder`
2776+
- Neither sub-embedder can set `source` to `composite` or `userProvided`
2777+
- Neither `binaryQuantized` and `distribution` are valid sub-embedder fields and must always be declared in the main embedder
2778+
- `documentTemplate` and `documentTemplateMaxBytes` are invalid fields for `searchEmbedder`
2779+
- `documentTemplate` and `documentTemplateMaxBytes` are mandatory for `indexingEmbedder`, if applicable to its source
2780+
2781+
`indexingEmbedder` and `searchEmbedder` are mandatory when using the `composite` source.
2782+
2783+
`indexingEmbedder` and `searchEmbedder` are incompatible with all other embedder sources.
2784+
27412785
#### Example
27422786

27432787
<CodeSamples id="update_embedders_1" />

reference/errors/error_codes.mdx

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -340,6 +340,10 @@ The [`limit`](/reference/api/search#limit) parameter is invalid. It should be an
340340

341341
The [`locales`](/reference/api/search#query-locales) parameter is invalid.
342342

343+
## `invalid_settings_embedder`
344+
345+
The [`embedders`](/reference/api/settings#embedders) index setting value is invalid.
346+
343347
## `invalid_settings_facet_search`
344348

345349
The [`facetSearch`](/reference/api/settings#facet-search) index setting value is invalid.

0 commit comments

Comments
 (0)