Update other_samples.md to reflect current features/tech (#1388)

pamelafox · web-flow · commit 127620a669d4 · 2024-03-15T07:32:53.000-07:00
* Update other_samples.md

* Add formats list

* Update docs/data_ingestion.md

* Apply suggestions from code review

* Apply suggestions from code review
diff --git a/docs/data_ingestion.md b/docs/data_ingestion.md
@@ -2,6 +2,7 @@
 
 This guide provides more details for using the `prepdocs` script to index documents for the Chat App.
 
+- [Supported document formats](#supported-document-formats)
 - [Overview of the manual indexing process](#overview-of-the-manual-indexing-process)
   - [Chunking](#chunking)
   - [Indexing additional documents](#indexing-additional-documents)
@@ -11,6 +12,22 @@ This guide provides more details for using the `prepdocs` script to index docume
   - [Removal of documents](#removal-of-documents)
   - [Scheduled indexing](#scheduled-indexing)
 
+## Supported document formats
+
+In order to ingest a document format, we need a tool that can turn it into text. By default, use Azure Document Intelligence (DI in the table below), but we also have local parsers for several formats. The local parsers are not as sophisticated as Azure Document Intelligence, but they can be used to decrease charges.
+
+| Format | Manual indexing                      | Integrated Vectorization |
+| ------ | ------------------------------------ | ------------------------ |
+| PDF    | Yes (DI or local with PyPDF)         | Yes                      |
+| HTML   | Yes (DI or local with BeautifulSoup) | Yes                      |
+| DOCX, PPTX, XLSX   | Yes (DI)                             | Yes                      |
+| Images (JPG, PNG, BPM, TIFF, HEIFF)| Yes (DI) | Yes                      |
+| TXT    | Yes (Local)                          | Yes                      |
+| JSON   | Yes (Local)                          | Yes                      |
+| CSV    | Yes (Local)                          | Yes                      |
+
+The Blob indexer used by the Integrated Vectorization approach also supports a few [additional formats](https://learn.microsoft.com/azure/search/search-howto-indexing-azure-blob-storage#supported-document-formats).
+
 ## Overview of the manual indexing process
 
 The `scripts/prepdocs.py` script is responsible for both uploading and indexing documents. The typical usage is to call it using `scripts/prepdocs.sh` (Mac/Linux) or `scripts/prepdocs.ps1` (Windows), as these scripts will set up a Python virtual environment and pass in the required parameters based on the current `azd` environment. Whenever `azd up` or `azd provision` is run, the script is called automatically.
diff --git a/docs/other_samples.md b/docs/other_samples.md
@@ -30,14 +30,17 @@ Feature comparison:
 | --- | --- | --- |
 | RAG approach | Multiple approaches | Only via ChatCompletion API data_sources |
 | Vector support | ✅ Yes | ✅ Yes |
-| Data ingestion | ✅ Yes (PDF) | ✅ Yes (PDF, TXT, MD, HTML) |
+| Data ingestion | ✅ Yes ([Many formats](data_ingestion.md#supported-document-formats)) | ✅ Yes ([Many formats](https://learn.microsoft.com/azure/ai-services/openai/concepts/use-your-data?tabs=ai-search#data-formats-and-file-types)) |
 | Persistent chat history | ❌ No (browser tab only) | ✅ Yes, in CosmosDB |
+| User feedback | ❌ No | ✅ Yes |
+| GPT-4-vision |  ✅ Yes | ❌ No |
+| Auth + ACL |  ✅ Yes | ✅ Yes |
 
 Technology comparison:
 
 | Tech | azure-search-openai-demo | sample-app-aoai-chatGPT |
 | --- | --- | --- |
 | Frontend | React | React |
-| Backend | Python (Quart) | Python (Flask) |
-| Vector DB | Azure AI Search | Azure AI Search |
+| Backend | Python (Quart) | Python (Quart) |
+| Vector DB | Azure AI Search | Azure AI Search, CosmosDB Mongo vCore, ElasticSearch, Pinecone, AzureML |
 | Deployment | Azure Developer CLI (azd) | Azure Portal, az, azd |