You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -11,6 +12,22 @@ This guide provides more details for using the `prepdocs` script to index docume
11
12
-[Removal of documents](#removal-of-documents)
12
13
-[Scheduled indexing](#scheduled-indexing)
13
14
15
+
## Supported document formats
16
+
17
+
In order to ingest a document format, we need a tool that can turn it into text. By default, use Azure Document Intelligence (DI in the table below), but we also have local parsers for several formats. The local parsers are not as sophisticated as Azure Document Intelligence, but they can be used to decrease charges.
18
+
19
+
| Format | Manual indexing | Integrated Vectorization |
The Blob indexer used by the Integrated Vectorization approach also supports a few [additional formats](https://learn.microsoft.com/azure/search/search-howto-indexing-azure-blob-storage#supported-document-formats).
30
+
14
31
## Overview of the manual indexing process
15
32
16
33
The `scripts/prepdocs.py` script is responsible for both uploading and indexing documents. The typical usage is to call it using `scripts/prepdocs.sh` (Mac/Linux) or `scripts/prepdocs.ps1` (Windows), as these scripts will set up a Python virtual environment and pass in the required parameters based on the current `azd` environment. Whenever `azd up` or `azd provision` is run, the script is called automatically.
0 commit comments