-
Notifications
You must be signed in to change notification settings - Fork 5k
Closed
Description
Please provide us with the following information:
This issue is for a: (mark with an x
)
- [ ] bug report -> please search issues before submitting
- [X] feature request
- [X] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)
I want to use the GPT-4 Turbo with vision functionality to embed and index both text and (inline) images from PDF files for subsequent searching.
The docs here (https://github.com/Azure-Samples/azure-search-openai-demo/blob/main/docs/gpt4v.md) describe the general pipeline, mentioning that:
- docs are split into pages of PNG
- text is extracted using OCR
- embeddings are generated on text and images
It is unclear to me whether the entire page PNG is being embedded, or if this refers to embedding just the inline images that have been extracted from the PDF/page-PNG (e.g. inline figures/charts). If inline images are embedded (instead of the whole page), how does the Azure OCR tool detect them and separate from the unstructured text? Is there such a feature offered?
Any clarifications would be greatly appreciated.
Metadata
Metadata
Assignees
Labels
No labels