Documenting ZIP file indexing behavior

gmndrg · web-flow · commit 554c2ece1185 · 2022-09-07T20:47:38.000-06:00
diff --git a/articles/search/search-file-storage-integration.md b/articles/search/search-file-storage-integration.md
@@ -7,7 +7,7 @@ author: mattmsft
 ms.author: magottei
 ms.service: cognitive-search
 ms.topic: how-to
-ms.date: 02/21/2022
+ms.date: 09/07/2022
 ---
 
 # Index data from Azure Files
@@ -37,6 +37,16 @@ The Azure Files indexer can extract text from the following document formats:
 
 [!INCLUDE [search-document-data-sources](../../includes/search-blob-data-sources.md)]
 
+
+## How Azure Files are indexed
+
+By default, most files are indexed as a single search document in the index, including files with structured content, such as JSON or CSV, which are indexed as a single chunk of text. 
+
+A compound or embedded document (such as a ZIP archive, a Word document with embedded Outlook email containing attachments, or an .MSG file with attachments) is also indexed as a single document. For example, all images extracted from the attachments of an .MSG file will be returned in the normalized_images field. If you have images, consider adding [AI enrichment](cognitive-search-concept-intro.md) to get more search utility from that content.
+
+Textual content of a document is extracted into a string field named "content". You can also extract standard and user-defined metadata.
+
+
 ## Define the data source
 
 The data source definition specifies the data to index, credentials, and policies for identifying changes in the data. A data source is defined as an independent resource so that it can be used by multiple indexers.