Address feedback and tweak docs

pamelafox · pamelafox · commit e9f13f57a2c1 · 2025-11-12T09:23:31.000-08:00
diff --git a/.github/prompts/testcov.prompt.md b/.github/prompts/testcov.prompt.md
diff --git a/README.md b/README.md
@@ -60,7 +60,7 @@ The repo includes sample data so it's ready to try end to end. In this sample ap
 - Chat (multi-turn) and Q&A (single turn) interfaces
 - Renders citations and thought process for each answer
 - Includes settings directly in the UI to tweak the behavior and experiment with options
-- Integrates Azure AI Search for indexing and retrieval of documents, with support for [many document formats](/docs/data_ingestion.md#supported-document-formats) as well as [cloud-based data ingestion](/docs/data_ingestion.md#cloud-based-ingestion)
+- Integrates Azure AI Search for indexing and retrieval of documents, with support for [many document formats](/docs/data_ingestion.md#supported-document-formats) as well as [cloud data ingestion](/docs/data_ingestion.md#cloud-data-ingestion)
 - Optional usage of [multimodal models](/docs/multimodal.md) to reason over image-heavy documents
 - Optional addition of [speech input/output](/docs/deploy_features.md#enabling-speech-inputoutput) for accessibility
 - Optional automation of [user login and data access](/docs/login_and_acl.md) via Microsoft Entra
diff --git a/azure.yaml b/azure.yaml
@@ -41,36 +41,36 @@ services:
           interactive: false
           continueOnError: false
   # Un-comment this section if using USE_CLOUD_INGESTION option
-  # document-extractor:
-  #   project: ./app/functions/document_extractor
-  #   language: py
-  #   host: function
-  #   hooks:
-  #     prepackage:
-  #       shell: pwsh
-  #       run: python ../../../scripts/copy_prepdocslib.py
-  #       interactive: false
-  #       continueOnError: false
-  # figure-processor:
-  #   project: ./app/functions/figure_processor
-  #   language: py
-  #   host: function
-  #   hooks:
-  #     prepackage:
-  #       shell: pwsh
-  #       run: python ../../../scripts/copy_prepdocslib.py
-  #       interactive: false
-  #       continueOnError: false
-  # text-processor:
-  #   project: ./app/functions/text_processor
-  #   language: py
-  #   host: function
-  #   hooks:
-  #     prepackage:
-  #       shell: pwsh
-  #       run: python ../../../scripts/copy_prepdocslib.py
-  #       interactive: false
-  #       continueOnError: false
+  document-extractor:
+    project: ./app/functions/document_extractor
+    language: py
+    host: function
+    hooks:
+      prepackage:
+        shell: pwsh
+        run: python ../../../scripts/copy_prepdocslib.py
+        interactive: false
+        continueOnError: false
+  figure-processor:
+    project: ./app/functions/figure_processor
+    language: py
+    host: function
+    hooks:
+      prepackage:
+        shell: pwsh
+        run: python ../../../scripts/copy_prepdocslib.py
+        interactive: false
+        continueOnError: false
+  text-processor:
+    project: ./app/functions/text_processor
+    language: py
+    host: function
+    hooks:
+      prepackage:
+        shell: pwsh
+        run: python ../../../scripts/copy_prepdocslib.py
+        interactive: false
+        continueOnError: false
 hooks:
     preprovision:
       windows:
diff --git a/docs/data_ingestion.md b/docs/data_ingestion.md
@@ -2,7 +2,7 @@
 
 The [azure-search-openai-demo](/) project can set up a full RAG chat app on Azure AI Search and OpenAI so that you can chat on custom data, like internal enterprise data or domain-specific knowledge sets. For full instructions on setting up the project, consult the [main README](/README.md), and then return here for detailed instructions on the data ingestion component.
 
-The chat app provides two ways to ingest data: manual ingestion and cloud-based ingestion. Both approaches use the same code for processing the data, but the manual ingestion runs locally while cloud ingestion runs in Azure Functions as Azure AI Search custom skills.
+The chat app provides two ways to ingest data: manual ingestion and cloud ingestion. Both approaches use the same code for processing the data, but the manual ingestion runs locally while cloud ingestion runs in Azure Functions as Azure AI Search custom skills.
 
 - [Supported document formats](#supported-document-formats)
 - [Ingestion stages](#ingestion-stages)
@@ -13,7 +13,7 @@ The chat app provides two ways to ingest data: manual ingestion and cloud-based
   - [Categorizing data for enhanced search](#enhancing-search-functionality-with-data-categorization)
   - [Indexing additional documents](#indexing-additional-documents)
   - [Removing documents](#removing-documents)
-- [Cloud-based ingestion](#cloud-based-ingestion)
+- [Cloud ingestion](#cloud-ingestion)
   - [Custom skills pipeline](#custom-skills-pipeline)
   - [Indexing of additional documents](#indexing-of-additional-documents)
   - [Removal of documents](#removal-of-documents)
@@ -36,7 +36,7 @@ In order to ingest a document format, we need a tool that can turn it into text.
 
 ## Ingestion stages
 
-The ingestion pipeline consists of three main stages that transform raw documents into searchable content in Azure AI Search. These stages apply to both [local ingestion](#local-ingestion) and [cloud-based ingestion](#cloud-based-ingestion).
+The ingestion pipeline consists of three main stages that transform raw documents into searchable content in Azure AI Search. These stages apply to both [local ingestion](#local-ingestion) and [cloud ingestion](#cloud-ingestion).
 
 ### Document extraction
 
@@ -132,7 +132,7 @@ To remove all documents, use `./scripts/prepdocs.sh --removeall` or `./scripts/p
 
 You can also remove individual documents by using the `--remove` flag. Open either `scripts/prepdocs.sh` or `scripts/prepdocs.ps1` and replace `/data/*` with `/data/YOUR-DOCUMENT-FILENAME-GOES-HERE.pdf`. Then run `scripts/prepdocs.sh --remove` or `scripts/prepdocs.ps1 --remove`.
 
-## Cloud-based ingestion
+## Cloud ingestion
 
 This project includes an optional feature to perform data ingestion in the cloud using Azure Functions as custom skills for Azure AI Search indexers. This approach offloads the ingestion workload from your local machine to the cloud, allowing for more scalable and efficient processing of large datasets.
 
diff --git a/docs/deploy_features.md b/docs/deploy_features.md
@@ -322,9 +322,9 @@ Alternatively you can use the browser's built-in [Speech Synthesis API](https://
 azd env set USE_SPEECH_OUTPUT_BROWSER true
 ```
 
-## Enabling cloud-based data ingestion
+## Enabling cloud data ingestion
 
-By default, this project runs a local script in order to ingest data. Once you move beyond the sample documents, you may want cloud-based ingestion, which uses Azure AI Search indexers and custom Azure AI Search skills based off the same code used by the local ingestion. That approach scales better to larger amounts of data.
+By default, this project runs a local script in order to ingest data. Once you move beyond the sample documents, you may want cloud ingestion, which uses Azure AI Search indexers and custom Azure AI Search skills based off the same code used by the local ingestion. That approach scales better to larger amounts of data.
 
 To enable cloud ingestion:
 
diff --git a/tests/test_mediadescriber.py b/tests/test_mediadescriber.py
@@ -68,7 +68,7 @@ def mock_get(self, url, **kwargs):
                                     "startPageNumber": 1,
                                     "endPageNumber": 1,
                                     "unit": "pixel",
-                                    "pages": [{"pageNumber": 0}],
+                                    "pages": [{"pageNumber": 1}],
                                 }
                             ],
                         },

Original file line number	Diff line number	Diff line change
`@@ -68,7 +68,7 @@ def mock_get(self, url, **kwargs):`
`68`	`68`	`"startPageNumber": 1,`
`69`	`69`	`"endPageNumber": 1,`
`70`	`70`	`"unit": "pixel",`
`71`		`- "pages": [{"pageNumber": 0}],`
	`71`	`+ "pages": [{"pageNumber": 1}],`
`72`	`72`	`}`
`73`	`73`	`],`
`74`	`74`	`},`