Skip to content

Commit a5c4cce

Browse files
authored
Use gpt-4o for vision approach (#1656)
* Update readme * Gpt40 changes * Default to gpt-4o for vision * Revert unneeded change * GPt4 version * Merge
1 parent f16c26b commit a5c4cce

File tree

4 files changed

+14
-15
lines changed

4 files changed

+14
-15
lines changed

app/frontend/src/components/GPT4VSettings/GPT4VSettings.tsx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ export const GPT4VSettings = ({ updateGPT4VInputs, updateUseGPT4V, isUseGPT4V, g
3434

3535
return (
3636
<Stack className={styles.container} tokens={{ childrenGap: 10 }}>
37-
<Checkbox checked={useGPT4V} label="Use GPT-4 Turbo with Vision" onChange={onuseGPT4V} />
37+
<Checkbox checked={useGPT4V} label="Use GPT vision model" onChange={onuseGPT4V} />
3838
{useGPT4V && (
3939
<Dropdown
4040
selectedKey={vectorFieldOption}

docs/customization.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ The `system_message_chat_conversation` variable is currently tailored to the sam
4545

4646
##### Chat with vision
4747

48-
If you followed the instructions in [docs/gpt4v.md](gpt4v.md) to enable the GPT-4 Vision model and then select "Use GPT-4 Turbo with Vision", then the chat tab will use the `chatreadretrievereadvision.py` approach instead. This approach is similar to the `chatreadretrieveread.py` approach, with a few differences:
48+
If you followed the instructions in [docs/gpt4v.md](gpt4v.md) to enable a GPT Vision model and then select "Use GPT vision model", then the chat tab will use the `chatreadretrievereadvision.py` approach instead. This approach is similar to the `chatreadretrieveread.py` approach, with a few differences:
4949

5050
1. Step 1 is the same as before, except it uses the GPT-4 Vision model instead of the default GPT-3.5 model.
5151
2. For this step, it also calculates a vector embedding for the user question using [the Computer Vision vectorize text API](https://learn.microsoft.com/azure/ai-services/computer-vision/how-to/image-retrieval#call-the-vectorize-text-api), and passes that to the Azure AI Search to compare against the `imageEmbeddings` fields in the indexed documents. For each matching document, it downloads the image blob and converts it to a base 64 encoding.
@@ -62,7 +62,7 @@ The `system_chat_template` variable is currently tailored to the sample data sin
6262

6363
#### Read with vision
6464

65-
If you followed the instructions in [docs/gpt4v.md](gpt4v.md) to enable the GPT-4 Vision model and then select "Use GPT-4 Turbo with Vision", then the ask tab will use the `retrievethenreadvision.py` approach instead. This approach is similar to the `retrievethenread.py` approach, with a few differences:
65+
If you followed the instructions in [docs/gpt4v.md](gpt4v.md) to enable the GPT-4 Vision model and then select "Use GPT vision model", then the ask tab will use the `retrievethenreadvision.py` approach instead. This approach is similar to the `retrievethenread.py` approach, with a few differences:
6666

6767
1. For this step, it also calculates a vector embedding for the user question using [the Computer Vision vectorize text API](https://learn.microsoft.com/azure/ai-services/computer-vision/how-to/image-retrieval#call-the-vectorize-text-api), and passes that to the Azure AI Search to compare against the `imageEmbeddings` fields in the indexed documents. For each matching document, it downloads the image blob and converts it to a base 64 encoding.
6868
2. When it combines the search results and user question, it includes the base 64 encoded images, and sends along both the text and images to the GPT4 Vision model (similar to this [documentation example](https://platform.openai.com/docs/guides/vision/quick-start)). The model generates a response that includes citations to the images, and the UI renders the base64 encoded images when a citation is clicked.

docs/gpt4v.md

Lines changed: 8 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -21,11 +21,11 @@ This repository now includes an example of integrating GPT-4 Turbo with Vision w
2121
|`gpt-4`|`vision-preview`|
2222

2323
- Ensure that you can deploy the Azure OpenAI resource group in [a region where all required components are available](https://learn.microsoft.com/azure/cognitive-services/openai/concepts/models#model-summary-table-and-region-availability):
24-
- Azure OpenAI models
25-
- gpt-35-turbo
26-
- text-embedding-ada-002
27-
- gpt-4v
28-
- [Azure AI Vision](https://learn.microsoft.com/azure/ai-services/computer-vision/)
24+
- Azure OpenAI models
25+
- gpt-35-turbo
26+
- text-embedding-ada-002
27+
- gpt-4v
28+
- [Azure AI Vision](https://learn.microsoft.com/azure/ai-services/computer-vision/)
2929

3030
### Setup and Usage
3131

@@ -46,20 +46,19 @@ This repository now includes an example of integrating GPT-4 Turbo with Vision w
4646
azd env set USE_GPT4V true
4747
```
4848

49-
When set, that flag will provision a Computer Vision resource and GPT-4-vision model, upload image versions of PDFs to Blob storage, upload embeddings of images in a new `imageEmbedding` field, and enable the vision approach in the UI.
49+
When set, that flag will provision a Computer Vision resource and gpt-4o model, upload image versions of PDFs to Blob storage, upload embeddings of images in a new `imageEmbedding` field, and enable the vision approach in the UI.
5050

5151
3. **Clean old deployments (optional):**
5252
Run `azd down --purge` for a fresh setup.
5353

5454
4. **Start the application:**
5555
Execute `azd up` to build, provision, deploy, and initiate document preparation.
5656

57-
5857
5. **Web Application Usage:**
5958
![GPT4V configuration screenshot](./images/gpt4v.png)
60-
- Access the developer options in the web app and select "Use GPT-4 Turbo with Vision".
59+
- Access the developer options in the web app and select "Use GPT vision model".
6160
- Sample questions will be updated for testing.
6261
- Interact with the questions to view responses.
63-
- The 'Thought Process' tab shows the retrieved data and its processing by GPT-4 Turbo with Vision.
62+
- The 'Thought Process' tab shows the retrieved data and its processing by the GPT vision model.
6463

6564
Feel free to explore and contribute to enhancing this feature. For questions or feedback, use the repository's issue tracker.

infra/main.bicep

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -114,9 +114,9 @@ var embedding = {
114114
dimensions: embeddingDimensions != 0 ? embeddingDimensions : 1536
115115
}
116116

117-
param gpt4vModelName string = 'gpt-4'
118-
param gpt4vDeploymentName string = 'gpt-4v'
119-
param gpt4vModelVersion string = 'vision-preview'
117+
param gpt4vModelName string = 'gpt-4o'
118+
param gpt4vDeploymentName string = 'gpt-4o'
119+
param gpt4vModelVersion string = '2024-05-13'
120120
param gpt4vDeploymentCapacity int = 10
121121

122122
param tenantId string = tenant().tenantId

0 commit comments

Comments
 (0)