Skip to content

Making inline images embeddable and searchable #1724

@emreonal12

Description

@emreonal12

Please provide us with the following information:

This issue is for a: (mark with an x)

- [ ] bug report -> please search issues before submitting
- [X] feature request
- [X] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

I want to use the GPT-4 Turbo with vision functionality to embed and index both text and (inline) images from PDF files for subsequent searching.

The docs here (https://github.com/Azure-Samples/azure-search-openai-demo/blob/main/docs/gpt4v.md) describe the general pipeline, mentioning that:

  • docs are split into pages of PNG
  • text is extracted using OCR
  • embeddings are generated on text and images

It is unclear to me whether the entire page PNG is being embedded, or if this refers to embedding just the inline images that have been extracted from the PDF/page-PNG (e.g. inline figures/charts). If inline images are embedded (instead of the whole page), how does the Azure OCR tool detect them and separate from the unstructured text? Is there such a feature offered?

Any clarifications would be greatly appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions