Move formats again

pamelafox · pamelafox · commit bc7529242767 · 2024-11-04T09:42:24.000-08:00
diff --git a/docs/data_ingestion.md b/docs/data_ingestion.md
@@ -4,8 +4,8 @@ The [azure-search-openai-demo](/) project can set up a full RAG chat app on Azur
 
 The chat app provides two ways to ingest data: manual indexing and integrated vectorization. This document explains the differences between the two approaches and provides an overview of the manual indexing process.
 
+- [Supported document formats](#supported-document-formats)
 - [Manual indexing process](#manual-indexing-process)
-  - [Supported document formats](#supported-document-formats)
   - [Chunking](#chunking)
   - [Categorizing data for enhanced search](#enhancing-search-functionality-with-data-categorization)
   - [Indexing additional documents](#indexing-additional-documents)
@@ -16,22 +16,9 @@ The chat app provides two ways to ingest data: manual indexing and integrated ve
   - [Scheduled indexing](#scheduled-indexing)
 - [Debugging tips](#debugging-tips)
 
-## Manual indexing process
-
-The [`prepdocs.py`](../app/backend/prepdocs.py) script is responsible for both uploading and indexing documents. The typical usage is to call it using `scripts/prepdocs.sh` (Mac/Linux) or `scripts/prepdocs.ps1` (Windows), as these scripts will set up a Python virtual environment and pass in the required parameters based on the current `azd` environment. You can pass additional arguments directly to the script, for example `scripts/prepdocs.ps1 --removeall`. Whenever `azd up` or `azd provision` is run, the script is called automatically.
-
-![Diagram of the indexing process](images/diagram_prepdocs.png)
-
-The script uses the following steps to index documents:
-
-1. If it doesn't yet exist, create a new index in Azure AI Search.
-2. Upload the PDFs to Azure Blob Storage.
-3. Split the PDFs into chunks of text.
-4. Upload the chunks to Azure AI Search. If using vectors (the default), also compute the embeddings and upload those alongside the text.
-
-### Supported document formats
+## Supported document formats
 
-In order to ingest a document format, we need a tool that can turn it into text. By default, use Azure Document Intelligence (DI in the table below), but we also have local parsers for several formats. The local parsers are not as sophisticated as Azure Document Intelligence, but they can be used to decrease charges.
+In order to ingest a document format, we need a tool that can turn it into text. By default, the manual indexing uses Azure Document Intelligence (DI in the table below), but we also have local parsers for several formats. The local parsers are not as sophisticated as Azure Document Intelligence, but they can be used to decrease charges.
 
 | Format | Manual indexing                      | Integrated Vectorization |
 | ------ | ------------------------------------ | ------------------------ |
@@ -45,6 +32,19 @@ In order to ingest a document format, we need a tool that can turn it into text.
 
 The Blob indexer used by the Integrated Vectorization approach also supports a few [additional formats](https://learn.microsoft.com/azure/search/search-howto-indexing-azure-blob-storage#supported-document-formats).
 
+## Manual indexing process
+
+The [`prepdocs.py`](../app/backend/prepdocs.py) script is responsible for both uploading and indexing documents. The typical usage is to call it using `scripts/prepdocs.sh` (Mac/Linux) or `scripts/prepdocs.ps1` (Windows), as these scripts will set up a Python virtual environment and pass in the required parameters based on the current `azd` environment. You can pass additional arguments directly to the script, for example `scripts/prepdocs.ps1 --removeall`. Whenever `azd up` or `azd provision` is run, the script is called automatically.
+
+![Diagram of the indexing process](images/diagram_prepdocs.png)
+
+The script uses the following steps to index documents:
+
+1. If it doesn't yet exist, create a new index in Azure AI Search.
+2. Upload the PDFs to Azure Blob Storage.
+3. Split the PDFs into chunks of text.
+4. Upload the chunks to Azure AI Search. If using vectors (the default), also compute the embeddings and upload those alongside the text.
+
 ### Chunking
 
 We're often asked why we need to break up the PDFs into chunks when Azure AI Search supports searching large documents.