Skip to content

Commit 554c2ec

Browse files
authored
Documenting ZIP file indexing behavior
1 parent f59c8d4 commit 554c2ec

File tree

1 file changed

+11
-1
lines changed

1 file changed

+11
-1
lines changed

articles/search/search-file-storage-integration.md

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ author: mattmsft
77
ms.author: magottei
88
ms.service: cognitive-search
99
ms.topic: how-to
10-
ms.date: 02/21/2022
10+
ms.date: 09/07/2022
1111
---
1212

1313
# Index data from Azure Files
@@ -37,6 +37,16 @@ The Azure Files indexer can extract text from the following document formats:
3737

3838
[!INCLUDE [search-document-data-sources](../../includes/search-blob-data-sources.md)]
3939

40+
41+
## How Azure Files are indexed
42+
43+
By default, most files are indexed as a single search document in the index, including files with structured content, such as JSON or CSV, which are indexed as a single chunk of text.
44+
45+
A compound or embedded document (such as a ZIP archive, a Word document with embedded Outlook email containing attachments, or an .MSG file with attachments) is also indexed as a single document. For example, all images extracted from the attachments of an .MSG file will be returned in the normalized_images field. If you have images, consider adding [AI enrichment](cognitive-search-concept-intro.md) to get more search utility from that content.
46+
47+
Textual content of a document is extracted into a string field named "content". You can also extract standard and user-defined metadata.
48+
49+
4050
## Define the data source
4151

4252
The data source definition specifies the data to index, credentials, and policies for identifying changes in the data. A data source is defined as an independent resource so that it can be used by multiple indexers.

0 commit comments

Comments
 (0)