You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
*[Enabling persistent chat history with Azure Cosmos DB](#enabling-persistent-chat-history-with-azure-cosmos-db)
@@ -226,34 +226,7 @@ If you have already deployed:
226
226
When your documents include images, you can optionally enable this feature that can
227
227
use image embeddings when searching and also use images when answering questions.
228
228
229
-
To enable multimodal embeddings and answering, run:
230
-
231
-
```shell
232
-
azd env set USE_MULTIMODAL true
233
-
```
234
-
235
-
With this feature enabled, the data ingestion process will extract images from your documents
236
-
using Document Intelligence, store the images in Azure Blob Storage, vectorize the images using the Azure AI Vision service, and store the image embeddings in the Azure AI Search index.
237
-
238
-
During the RAG flow, the app will perform a multi-vector query using both text and image embeddings, and then send any images associated with the retrieved document chunks to the chat completion model for answering questions. This feature assumes that your chat completion model supports multimodal inputs, such as `gpt-4o` or `gpt-4o-mini`.
239
-
240
-
You can customize the RAG flow approach with a few additional environment variables.
241
-
242
-
To only use the text embeddings for the search step (no image embeddings), run:
243
-
244
-
```shell
245
-
azd env set RAG_VECTOR_FIELDS_DEFAULT "textEmbeddingOnly"
246
-
```
247
-
248
-
To only send text sources to the chat completion model (no images), run:
249
-
250
-
```shell
251
-
azd env set RAG_LLM_INPUTS_OVERRIDE "texts"
252
-
```
253
-
254
-
You can also modify those settings in the "Developer Settings" in the chat UI,
255
-
to experiment with different options before committing to them.
256
-
229
+
Learn more in the [multimodal guide](./multimodal.md).
257
230
258
231
## Enabling media description with Azure Content Understanding
Copy file name to clipboardExpand all lines: docs/multimodal.md
+53-8Lines changed: 53 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,6 +8,13 @@ This repository includes an optional feature that uses the GPT vision model to g
8
8
9
9
## How it works
10
10
11
+
12
+
With this feature enabled, the data ingestion process will extract images from your documents
13
+
using Document Intelligence, store the images in Azure Blob Storage, vectorize the images using the Azure AI Vision service, and store the image embeddings in the Azure AI Search index.
14
+
15
+
During the RAG flow, the app will perform a multi-vector query using both text and image embeddings, and then send any images associated with the retrieved document chunks to the chat completion model for answering questions. This feature assumes that your chat completion model supports multimodal inputs, such as `gpt-4o` or `gpt-4o-mini`.
16
+
17
+
----OLD----
11
18
When this feature is enabled, the following changes are made to the application:
12
19
13
20
***Search index**: We added a new field to the Azure AI Search index to store the embedding returned by the multimodal Azure AI Vision API (while keeping the existing field that stores the OpenAI text embeddings).
@@ -32,31 +39,69 @@ For more details on how this feature works, read [this blog post](https://techco
32
39
33
40
### Deployment
34
41
35
-
1.**Enable GPT vision approach:**
42
+
1.**Enable multimodal capabilities:**
36
43
37
44
First, make sure you do *not* have integrated vectorization enabled, since that is currently incompatible:
38
45
39
46
```shell
40
47
azd env set USE_FEATURE_INT_VECTORIZATION false
41
48
```
42
49
43
-
Then set the environment variable for enabling vision support:
50
+
Then set the azd environment variable to enable the multimodal feature:
51
+
52
+
```shell
53
+
azd env set USE_MULTIMODAL true
54
+
```
55
+
56
+
3.**Provision the multimodal resources:**
57
+
58
+
Either run `azd up` if you haven't run it before, or run `azd provision` to provision the multimodal resources. This will create a new Azure AI Vision account and update the Azure AI Search index to include the new image embedding field.
59
+
60
+
4.**Re-index the data:**
61
+
62
+
If you have already indexed data, you will need to re-index it to include the new image embeddings.
63
+
We recommend creating a new Azure AI Search index to avoid conflicts with the existing index.
44
64
45
65
```shell
46
-
azd env setUSE_GPT4V true
66
+
azd env setAZURE_SEARCH_INDEX multimodal-index
47
67
```
48
68
49
-
When set, that flag will provision a Azure AI Vision resource and gpt-4o model, upload image versions of PDFs to Blob storage, upload embeddings of images in a new `imageEmbedding` field, and enable the vision approach in the UI.
69
+
Then run the data ingestion process again to re-index the data:
50
70
51
-
2.**Clean old deployments (optional):**
52
-
Run `azd down --purge` for a fresh setup.
71
+
Linux/Mac:
53
72
54
-
3.**Start the application:**
55
-
Execute `azd up` to build, provision, deploy, and initiate document preparation.
0 commit comments