Skip to content

Leverage AI Search indexer (with integrated chunking and vectorization) #1020

@iMicknl

Description

@iMicknl

This issue is for a: (mark with an x)

  • bug report -> please search issues before submitting
  • feature request
  • documentation issue or request
  • regression (a behavior that used to work and stopped in a new release)

Expected/desired behavior

Eventually, users often want continuous indexing of a blob storage. The local script could ingest the files to a blob storage and configure the indexers and skillsets (e.g. Form Recognizer/OCR) for continuous reindexing.

Azure AI Search has an integrated chunking and vectorization engine in public preview, thus not ready for production yet. However would be good to investigate if more native AI Search features can be incorporated in this example in the future.

See https://techcommunity.microsoft.com/t5/ai-azure-ai-services-blog/announcing-the-public-preview-of-integrated-vectorization-in/ba-p/3960809

Documentation: https://learn.microsoft.com/en-us/azure/search/vector-search-integrated-vectorization

Example notebook: https://github.com/Azure/azure-search-vector-samples/blob/main/demo-python/code/azure-search-integrated-vectorization-sample.ipynb

Advantages

  • Continous reindexing out of the box
  • Easy to index new documents (even via UI), by uploading to blob storage

Disadvantages

  • Harder to test different chunking strategies

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions