Skip to content

Commit 4e903b2

Browse files
Update AI-powered search tutorial (#3056)
1 parent a3a155b commit 4e903b2

File tree

6 files changed

+138
-47
lines changed

6 files changed

+138
-47
lines changed

guides/embedders/cloudflare.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@ In this configuration:
5555
- `source`: Specifies the source of the embedder, which is set to "rest" for using a REST API.
5656
- `apiKey`: Replace `<API Key>` with your actual Cloudflare API key.
5757
- `dimensions`: Specifies the dimensions of the embeddings. Set to 384 for `baai/bge-small-en-v1.5`, 768 for `baai/bge-base-en-v1.5`, or 1024 for `baai/bge-large-en-v1.5`.
58-
- `documentTemplate`: Optionally, you can provide a [custom template](/learn/ai_powered_search/getting_started_with_ai_search?utm_campaign=vector-search&utm_source=docs&utm_medium=cloudflare-embeddings-guide#documenttemplate) for generating embeddings from your documents.
58+
- `documentTemplate`: Optionally, you can provide a [custom template](/learn/ai_powered_search/getting_started_with_ai_search) for generating embeddings from your documents.
5959
- `url`: Specifies the URL of the Cloudflare Worker AI API endpoint.
6060
- `request`: Defines the request structure for the Cloudflare Worker AI API, including the input parameters.
6161
- `response`: Defines the expected response structure from the Cloudflare Worker AI API, including the embedding data.

guides/embedders/cohere.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ In this configuration:
5858
- `source`: Specifies the source of the embedder, which is set to "rest" for using a REST API.
5959
- `apiKey`: Replace `<Cohere API Key>` with your actual Cohere API key.
6060
- `dimensions`: Specifies the dimensions of the embeddings, set to 1024 for the `embed-english-v3.0` model.
61-
- `documentTemplate`: Optionally, you can provide a [custom template](/learn/ai_powered_search/getting_started_with_ai_search?utm_campaign=vector-search&utm_source=docs&utm_medium=cohere-embeddings-guide#documenttemplate) for generating embeddings from your documents.
61+
- `documentTemplate`: Optionally, you can provide a [custom template](/learn/ai_powered_search/getting_started_with_ai_search) for generating embeddings from your documents.
6262
- `url`: Specifies the URL of the Cohere API endpoint.
6363
- `request`: Defines the request structure for the Cohere API, including the model name and input parameters.
6464
- `response`: Defines the expected response structure from the Cohere API, including the embedding data.

guides/embedders/mistral.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ In this configuration:
5454
- `source`: Specifies the source of the embedder, which is set to "rest" for using a REST API.
5555
- `apiKey`: Replace `<Mistral API Key>` with your actual Mistral API key.
5656
- `dimensions`: Specifies the dimensions of the embeddings, set to 1024 for the `mistral-embed` model.
57-
- `documentTemplate`: Optionally, you can provide a [custom template](/learn/ai_powered_search/getting_started_with_ai_search?utm_campaign=vector-search&utm_source=docs&utm_medium=mistral-embeddings-guide#documenttemplate) for generating embeddings from your documents.
57+
- `documentTemplate`: Optionally, you can provide a [custom template](/learn/ai_powered_search/getting_started_with_ai_search) for generating embeddings from your documents.
5858
- `url`: Specifies the URL of the Mistral API endpoint.
5959
- `request`: Defines the request structure for the Mistral API, including the model name and input parameters.
6060
- `response`: Defines the expected response structure from the Mistral API, including the embedding data.

guides/embedders/openai.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ In this configuration:
4646
- `source`: Specifies the source of the embedder, which is set to "openAi" for using OpenAI's API.
4747
- `apiKey`: Replace `<OpenAI API Key>` with your actual OpenAI API key.
4848
- `dimensions`: Specifies the dimensions of the embeddings. Set to 1536 for `text-embedding-3-small` and `text-embedding-ada-002`, or 3072 for `text-embedding-3-large`.
49-
- `documentTemplate`: Optionally, you can provide a [custom template](/learn/ai_powered_search/getting_started_with_ai_search?utm_campaign=vector-search&utm_source=docs&utm_medium=openai-embeddings-guide#documenttemplate) for generating embeddings from your documents.
49+
- `documentTemplate`: Optionally, you can provide a [custom template](/learn/ai_powered_search/getting_started_with_ai_search) for generating embeddings from your documents.
5050
- `model`: Specifies the OpenAI model to use for generating embeddings. Choose from `text-embedding-3-large`, `text-embedding-3-small`, or `text-embedding-ada-002`.
5151

5252
Once you've configured the embedder settings, Meilisearch will automatically generate embeddings for your documents and store them in the vector store.

guides/embedders/voyage.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@ In this configuration:
5959
- `source`: Specifies the source of the embedder, which is set to "rest" for using a REST API.
6060
- `apiKey`: Replace `<Voyage AI API Key>` with your actual Voyage AI API key.
6161
- `dimensions`: Specifies the dimensions of the embeddings. Set to 1024 for `voyage-2`, `voyage-large-2-instruct`, and `voyage-multilingual-2`, or 1536 for `voyage-large-2`.
62-
- `documentTemplate`: Optionally, you can provide a [custom template](/learn/ai_powered_search/getting_started_with_ai_search?utm_campaign=vector-search&utm_source=docs&utm_medium=voyage-embeddings-guide#documenttemplate) for generating embeddings from your documents.
62+
- `documentTemplate`: Optionally, you can provide a [custom template](/learn/ai_powered_search/getting_started_with_ai_search) for generating embeddings from your documents.
6363
- `url`: Specifies the URL of the Voyage AI API endpoint.
6464
- `request`: Defines the request structure for the Voyage AI API, including the model name and input parameters.
6565
- `response`: Defines the expected response structure from the Voyage AI API, including the embedding data.
@@ -68,7 +68,7 @@ Once you've configured the embedder settings, Meilisearch will automatically gen
6868

6969
Please note that most third-party tools have rate limiting, which is managed by Meilisearch. If you have a free account, the indexation process may take some time, but Meilisearch will handle it with a retry strategy.
7070

71-
It's recommended to monitor the tasks queue to ensure everything is running smoothly. You can access the tasks queue using the Cloud UI or the [Meilisearch API](/reference/api/tasks?utm_campaign=vector-search&utm_source=docs&utm_medium=voyage-embeddings-guide#get-tasks).
71+
It's recommended to monitor the tasks queue to ensure everything is running smoothly. You can access the tasks queue using the Cloud UI or the [Meilisearch API](/reference/api/tasks).
7272

7373
## Testing semantic search
7474

learn/ai_powered_search/getting_started_with_ai_search.mdx

Lines changed: 132 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,9 @@ description: AI-powered search is an experimental technology that uses LLMs to r
55

66
# Getting started with AI-powered search <NoticeTag type="experimental" label="experimental" />
77

8-
[AI-powered search](https://meilisearch.com/solutions/vector-search?utm_campaign=vector-search&utm_source=docs&utm_content=getting-started-with-ai-search), sometimes also called vector search and hybrid search, is an experimental technology that uses [large language models](https://en.wikipedia.org/wiki/Large_language_model) to retrieve search results based on the meaning and context of a query.
8+
[AI-powered search](https://meilisearch.com/solutions/vector-search), sometimes also called vector search or hybrid search, is an experimental technology that uses [large language models (LLMs)](https://en.wikipedia.org/wiki/Large_language_model) to retrieve search results based on the meaning and context of a query.
99

10-
This tutorial will walk you through configuring AI-powered search in your Meilisearch project. You will activate the vector store setting, generate document embeddings with OpenAI, and perform your first search.
10+
This tutorial will walk you through configuring AI-powered search in your Meilisearch project. You will see how to activate this feature, generate document embeddings with OpenAI, and perform your first search.
1111

1212
## Requirements
1313

@@ -17,90 +17,181 @@ This tutorial will walk you through configuring AI-powered search in your Meilis
1717

1818
## Create a new index
1919

20-
Create a `kitchenware` index and add [this kitchenware products dataset](/assets/datasets/kitchenware.json) to it. If necessary, consult the quick start for instructions on how to configure a basic Meilisearch installation.
20+
First, create a new Meilisearch project. If this is your first time using Meilisearch, follow the [quick start](/learn/getting_started/cloud_quick_start) then come back to this tutorial.
21+
22+
Next, create a `kitchenware` index and add [this kitchenware products dataset](https://raw.githubusercontent.com/meilisearch/documentation/main/assets/datasets/kitchenware.json) to it. It will take Meilisearch a few moments to process your request, but you can continue to the next step while your data is indexing.
2123

2224
## Activate AI-powered search
2325

24-
First, activate the AI-powered search experimental feature. Exactly how to do that depends on whether you are using [Meilisearch Cloud](#meilisearch-cloud-projects) or [self-hosting Meilisearch](#self-hosted-instances).
26+
AI-powered search is an experimental feature and is disabled by default. You must manually activate it either via the Meilisearch Cloud UI, or with the experimental features endpoint.
2527

26-
### Meilisearch Cloud projects
28+
<Capsule intent="tip" title="Meilisearch Cloud AI-powered search waitlist">
29+
To use AI-powered search with Meilisearch Cloud, you must first enter the waitlist. You will not be able to activate vector search until your sign-up has been approved.
30+
</Capsule>
2731

28-
If using Meilisearch Cloud, navigate to your project overview and find "Experimental features". Then check the "AI-powered search" box.
32+
### Meilisearch Cloud UI
2933

30-
![A section of the project overview interface titled "Experimental features". The image shows a few options, including "Vector store".](https://raw.githubusercontent.com/meilisearch/documentation/main/assets/images/vector-search/01-cloud-vector-store.png)
34+
Navigate to your project overview and find "Experimental features". Then click on the "AI-powered search" box.
3135

32-
<Capsule intent="note" title="Meilisearch Cloud AI-powered search waitlist">
33-
To ensure proper scaling of Meilisearch Cloud's latest AI-powered search offering, you must enter the waitlist before activating vector search. You will not be able to activate vector search in the Cloud interface or via the `/experimental-features` route until your sign-up has been approved.
34-
</Capsule>
36+
![A section of the project overview interface titled "Experimental features". The image shows a few options, including "Vector store".](https://raw.githubusercontent.com/meilisearch/documentation/main/assets/images/vector-search/01-cloud-vector-store.png)
3537

36-
### Self-hosted instances
38+
### Experimental features endpoint
3739

38-
Use [the `/experimental-features` route](/reference/api/experimental_features?utm_campaign=vector-search&utm_source=docs&utm_medium=vector-search-guide) to activate vector search during runtime:
40+
Use [the `/experimental-features` route](/reference/api/experimental_features) to activate vector search during runtime:
3941

4042
```sh
4143
curl \
42-
-X PATCH 'http://localhost:7700/experimental-features/' \
44+
-X PATCH 'MEILISEARCH_URL/experimental-features/' \
4345
-H 'Content-Type: application/json' \
4446
--data-binary '{
4547
"vectorStore": true
4648
}'
4749
```
4850

49-
## Generate vector embeddings with OpenAI
51+
Replace `MEILISEARCH_URL` with your project's URL. In most cases, this should look like `https://ms-000xx00x000-xx.xxx.meilisearch.io` if you're using Meilisearch Cloud, or `http://localhost:7700` if you are running Meilisearch in your local machine.
52+
53+
## Generate embeddings with OpenAI
54+
55+
In this step, you will configure an OpenAI embedder. Meilisearch uses **embedders** to translate documents into **embeddings**, which are mathematical representations of a document's meaning and context.
56+
57+
Open a blank file in your text editor. You will only use this file to build your embedder one step at a time, so there's no need to save it if you plan to finish the tutorial in one sitting.
58+
59+
### Choose an embedder name
60+
61+
In your blank file, create your `embedder` object:
62+
63+
```json
64+
{
65+
"products-openai": {}
66+
}
67+
```
68+
69+
`products-openai` is the name of your embedder for this tutorial. You can name embedders any way you want, but try to keep it simple, short, and easy to remember.
70+
71+
### Choose an embedder source
72+
73+
Meilisearch relies on third-party services to generate embeddings. These services are often referred to as the embedder source.
74+
75+
Add a new `source` field to your embedder object:
76+
77+
```json
78+
{
79+
"products-openai": {
80+
"source": "openai"
81+
}
82+
}
83+
```
84+
85+
Meilisearch supports several embedder sources. This tutorial uses OpenAI because it is a good option that fits most use cases.
86+
87+
### Choose an embedder model
88+
89+
Models supply the information required for embedders to process your documents.
90+
91+
Add a new `model` field to your embedder object:
92+
93+
```json
94+
{
95+
"products-openai": {
96+
"source": "openai",
97+
"model": "text-embedding-3-small"
98+
}
99+
}
100+
```
50101

51-
Next, you must generate vector embeddings for all documents in your dataset. Embeddings are mathematical representations of the meanings of words and sentences in your documents. Meilisearch relies on external providers to generate these embeddings. This tutorial uses an OpenAI embedder, but Meilisearch also supports embedders from HuggingFace, Ollama, and any embedder accessible via a RESTful API.
102+
Each embedder service supports different models targeting specific use cases. `text-embedding-3-small` is a cost-effective model for general usage.
52103

53-
Use the `embedders` index setting of the [update `/settings` endpoint](/reference/api/settings?utm_campaign=vector-search&utm_source=docs&utm_medium=vector-search-guide) to configure an [OpenAI](https://platform.openai.com/) embedder:
104+
### Create your API key
105+
106+
Log into OpenAI, or create an account if this is your first time using it. Generate a new API key using [OpenAI's web interface](https://platform.openai.com/api-keys).
107+
108+
Add the `apiKey` field to your embedder:
109+
110+
```json
111+
{
112+
"products-openai": {
113+
"source": "openai",
114+
"model": "text-embedding-3-small",
115+
"apiKey": "OPEN_AI_API_KEY",
116+
}
117+
}
118+
```
119+
120+
Replace `OPEN_AI_API_KEY` with your own API key.
121+
122+
<Capsule intent="tip" title="OpenAI key tiers">
123+
You may use any key tier for this tutorial. Use at least [Tier 2 keys](https://platform.openai.com/docs/guides/rate-limits/usage-tiers?context=tier-two) in production environments.
124+
</Capsule>
125+
126+
### Design a prompt template
127+
128+
Meilisearch embedders only accept textual input, but documents can be complex objects containing different types of data. This means you must convert your documents into a single text field. Meilisearch uses [Liquid](https://shopify.github.io/liquid/basics/introduction/), an open-source templating language to help you do that.
129+
130+
A good template should be short and only include the most important information about a document. Add the following `documentTemplate` to your embedder:
131+
132+
```json
133+
{
134+
"products-openai": {
135+
"source": "openai",
136+
"model": "text-embedding-3-small",
137+
"apiKey": "OPEN_AI_API_KEY",
138+
"documentTemplate": "An object used in a kitchen named '{{doc.name}}'"
139+
}
140+
}
141+
```
142+
143+
This template starts by giving the general context of the document: `An object used in a kitchen`. Then it adds the information that is specific to each document: `doc` represents your document, and you can access any of its attributes using dot notation. `name` is an attribute with values such as `wooden spoon` or `rolling pin`. Since it is present in all documents in this dataset and describes the product in few words, it is a good choice to include in the template.
144+
145+
### Create the embedder
146+
147+
Your embedder object is ready. Send it to Meilisearch by updating your index settings:
54148

55149
```sh
56150
curl \
57-
-X PATCH 'http://localhost:7700/indexes/kitchenware/settings' \
151+
-X PATCH 'MEILISEARCH_URL/indexes/kitchenware/settings/embedders' \
58152
-H 'Content-Type: application/json' \
59153
--data-binary '{
60-
"embedders": {
61-
"openai": {
62-
"source": "openAi",
63-
"apiKey": "OPEN_AI_API_KEY",
64-
"model": "text-embedding-3-small",
65-
"documentTemplate": "An object used in a kitchen named '{{doc.name}}'"
66-
}
154+
"products-openai": {
155+
"source": "openAi",
156+
"apiKey": "OPEN_AI_API_KEY",
157+
"model": "text-embedding-3-small",
158+
"documentTemplate": "An object used in a kitchen named '{{doc.name}}'"
67159
}
68160
}'
69161
```
70162

71-
Replace `OPEN_AI_API_KEY` with your [OpenAI API key](https://platform.openai.com/api-keys). You may use any key tier for this tutorial, but prefer [Tier 2 keys](https://platform.openai.com/docs/guides/rate-limits/usage-tiers?context=tier-two) for optimal performance in production environments.
72-
73-
### `documentTemplate`
74-
75-
`documentTemplate` describes a short [Liquid template](https://shopify.github.io/liquid/). The text inside curly brackets (`{{`) indicates a document field in dot notation, where `doc` indicates the document itself and the string that comes after the dot indicates a document attribute. Meilisearch replaces these brackets and their contents with the corresponding field value.
76-
77-
The resulting text is the prompt OpenAI uses to generate document embeddings.
163+
Replace `MEILISEARCH_URL` with the address of your Meilisearch project, and `OPEN_AI_API_KEY` with your [OpenAI API key](https://platform.openai.com/api-keys).
78164

79-
For example, kitchenware documents have three fields: `id`, `name`, and `price`. If your `documentTemplate` is `"An object used in a kitchen named '{{doc.name}}'"`, the text Meilisearch will send to the embedder when indexing the first document is `"An object used in a kitchen named 'Wooden spoon'"`.
80-
81-
For the best results, always provide a `documentTemplate`. Keep your templates short and only include highly relevant information. This ensures optimal indexing performance and search result relevancy.
165+
Meilisearch and OpenAI will start processing your documents and updating your index. This may take a few moments, but once it's done you are ready to perform an AI-powered search.
82166

83167
## Perform an AI-powered search
84168

85-
Perform AI-powered searches with `q` and `hybrid` to retrieve search results using the default embedder you configured in the previous step:
169+
AI-powered searches are very similar to basic text searches. You must query the `/search` endpoint with a request containing both the `q` and the `hybrid` parameters:
86170

87171
```sh
88172
curl \
89-
-X POST 'http://localhost:7700/indexes/kitchenware/search' \
173+
-X POST 'MEILISEARCH_URL/indexes/kitchenware/search' \
90174
-H 'content-type: application/json' \
91175
--data-binary '{
92176
"q": "kitchen utensils made of wood",
93177
"hybrid": {
94-
"embedder": "openai",
95-
"semanticRatio": 0.7
178+
"embedder": "products-openai"
96179
}
97180
}'
98181
```
99182

100-
Meilisearch will return a mix of semantic and full-text matches, prioritizing results that match the query's meaning and context. If you want Meilisearch to return more results based on the meaning and context of a search, set `semanticRatio` to a value greater than `0.5`. Setting `semanticRatio` to a value lower than `0.5`, instead, will return more full-text matches.
183+
For this tutorial, `hybrid` is an object with a single `embedder` field.
184+
185+
Meilisearch will then return an equal mix of semantic and full-text matches.
101186

102187
## Conclusion
103188

104-
You have seen how to set up and perform AI-powered searches with Meilisearch and OpenAI. For more in-depth information, consult the reference for embedders and the `hybrid` search parameter.
189+
Congratulations! You have created an index, added a small dataset to it, and activated AI-powered search. You then used OpenAI to generate embeddings out of your documents, and performed your first AI-powered search.
190+
191+
## Next steps
192+
193+
Now you have a basic overview of the basic steps required for setting up and performing AI-powered searches, you might want to try and implement this feature in your own application.
194+
195+
For practical information on implementing AI-powered search with other services, consult our [guides section](/guides/ai/openai). There you will find specific instructions for embedders such as [LangChain](/guides/ai/langchain) and [Cloudflare](/guides/ai/cloudflare).
105196

106-
AI-powered search is an experimental Meilisearch feature and is undergoing active development—[join the discussion on GitHub](https://github.com/orgs/meilisearch/discussions/677).
197+
For more in-depth information, consult the API reference for [embedder settings](/reference/api/settings#embedders-experimental) and [the `hybrid` search parameter](/reference/api/search#hybrid-search-experimental).

0 commit comments

Comments
 (0)