Skip to content

Commit 51fd298

Browse files
committed
Doc fixes
1 parent 0dfacbf commit 51fd298

File tree

2 files changed

+55
-37
lines changed

2 files changed

+55
-37
lines changed

docs/deploy_features.md

Lines changed: 2 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ You should typically enable these features before running `azd up`. Once you've
66
* [Using different chat completion models](#using-different-chat-completion-models)
77
* [Using reasoning models](#using-reasoning-models)
88
* [Using different embedding models](#using-different-embedding-models)
9-
* [Enabling GPT vision feature](#enabling-gpt-vision-feature)
9+
* [Enabling multimodal embeddings and answering](#enabling-multimodal-embeddings-and-answering)
1010
* [Enabling media description with Azure Content Understanding](#enabling-media-description-with-azure-content-understanding)
1111
* [Enabling client-side chat history](#enabling-client-side-chat-history)
1212
* [Enabling persistent chat history with Azure Cosmos DB](#enabling-persistent-chat-history-with-azure-cosmos-db)
@@ -226,34 +226,7 @@ If you have already deployed:
226226
When your documents include images, you can optionally enable this feature that can
227227
use image embeddings when searching and also use images when answering questions.
228228
229-
To enable multimodal embeddings and answering, run:
230-
231-
```shell
232-
azd env set USE_MULTIMODAL true
233-
```
234-
235-
With this feature enabled, the data ingestion process will extract images from your documents
236-
using Document Intelligence, store the images in Azure Blob Storage, vectorize the images using the Azure AI Vision service, and store the image embeddings in the Azure AI Search index.
237-
238-
During the RAG flow, the app will perform a multi-vector query using both text and image embeddings, and then send any images associated with the retrieved document chunks to the chat completion model for answering questions. This feature assumes that your chat completion model supports multimodal inputs, such as `gpt-4o` or `gpt-4o-mini`.
239-
240-
You can customize the RAG flow approach with a few additional environment variables.
241-
242-
To only use the text embeddings for the search step (no image embeddings), run:
243-
244-
```shell
245-
azd env set RAG_VECTOR_FIELDS_DEFAULT "textEmbeddingOnly"
246-
```
247-
248-
To only send text sources to the chat completion model (no images), run:
249-
250-
```shell
251-
azd env set RAG_LLM_INPUTS_OVERRIDE "texts"
252-
```
253-
254-
You can also modify those settings in the "Developer Settings" in the chat UI,
255-
to experiment with different options before committing to them.
256-
229+
Learn more in the [multimodal guide](./multimodal.md).
257230
258231
## Enabling media description with Azure Content Understanding
259232

docs/multimodal.md

Lines changed: 53 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,13 @@ This repository includes an optional feature that uses the GPT vision model to g
88

99
## How it works
1010

11+
12+
With this feature enabled, the data ingestion process will extract images from your documents
13+
using Document Intelligence, store the images in Azure Blob Storage, vectorize the images using the Azure AI Vision service, and store the image embeddings in the Azure AI Search index.
14+
15+
During the RAG flow, the app will perform a multi-vector query using both text and image embeddings, and then send any images associated with the retrieved document chunks to the chat completion model for answering questions. This feature assumes that your chat completion model supports multimodal inputs, such as `gpt-4o` or `gpt-4o-mini`.
16+
17+
----OLD----
1118
When this feature is enabled, the following changes are made to the application:
1219

1320
* **Search index**: We added a new field to the Azure AI Search index to store the embedding returned by the multimodal Azure AI Vision API (while keeping the existing field that stores the OpenAI text embeddings).
@@ -32,31 +39,69 @@ For more details on how this feature works, read [this blog post](https://techco
3239

3340
### Deployment
3441

35-
1. **Enable GPT vision approach:**
42+
1. **Enable multimodal capabilities:**
3643

3744
First, make sure you do *not* have integrated vectorization enabled, since that is currently incompatible:
3845

3946
```shell
4047
azd env set USE_FEATURE_INT_VECTORIZATION false
4148
```
4249

43-
Then set the environment variable for enabling vision support:
50+
Then set the azd environment variable to enable the multimodal feature:
51+
52+
```shell
53+
azd env set USE_MULTIMODAL true
54+
```
55+
56+
3. **Provision the multimodal resources:**
57+
58+
Either run `azd up` if you haven't run it before, or run `azd provision` to provision the multimodal resources. This will create a new Azure AI Vision account and update the Azure AI Search index to include the new image embedding field.
59+
60+
4. **Re-index the data:**
61+
62+
If you have already indexed data, you will need to re-index it to include the new image embeddings.
63+
We recommend creating a new Azure AI Search index to avoid conflicts with the existing index.
4464

4565
```shell
46-
azd env set USE_GPT4V true
66+
azd env set AZURE_SEARCH_INDEX multimodal-index
4767
```
4868

49-
When set, that flag will provision a Azure AI Vision resource and gpt-4o model, upload image versions of PDFs to Blob storage, upload embeddings of images in a new `imageEmbedding` field, and enable the vision approach in the UI.
69+
Then run the data ingestion process again to re-index the data:
5070

51-
2. **Clean old deployments (optional):**
52-
Run `azd down --purge` for a fresh setup.
71+
Linux/Mac:
5372

54-
3. **Start the application:**
55-
Execute `azd up` to build, provision, deploy, and initiate document preparation.
73+
```shell
74+
./scripts/prepdocs.sh
75+
```
76+
77+
Windows:
78+
79+
```shell
80+
.\scripts\prepdocs.ps1
81+
```
5682

5783
4. **Try out the feature:**
5884
![GPT4V configuration screenshot](./images/gpt4v.png)
5985
* Access the developer options in the web app and select "Use GPT vision model".
6086
* New sample questions will show up in the UI that are based on the sample financial document.
6187
* Try out a question and see the answer generated by the GPT vision model.
6288
* Check the 'Thought process' and 'Supporting content' tabs.
89+
90+
5. **Customize the multimodal approach:**
91+
92+
You can customize the RAG flow approach with a few additional environment variables.
93+
94+
To only use the text embeddings for the search step (no image embeddings), run:
95+
96+
```shell
97+
azd env set RAG_VECTOR_FIELDS_DEFAULT "textEmbeddingOnly"
98+
```
99+
100+
To only send text sources to the chat completion model (no images), run:
101+
102+
```shell
103+
azd env set RAG_LLM_INPUTS_OVERRIDE "texts"
104+
```
105+
106+
You can also modify those settings in the "Developer Settings" in the chat UI,
107+
to experiment with different options before committing to them.

0 commit comments

Comments
 (0)