You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/search/search-file-storage-integration.md
+7-5Lines changed: 7 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -27,6 +27,8 @@ This article supplements [**Create an indexer**](search-howto-create-indexers.md
27
27
28
28
+ Files containing text. If you have binary data, you can include [AI enrichment](cognitive-search-concept-intro.md) for image analysis.
29
29
30
+
+ Read permissions on Azure Storage. A "full access" connection string includes a key that grants access to the content, but if you're using Azure roles instead, make sure the [search service managed identity](search-howto-managed-identities-data-sources.md) has **Data and Reader** permissions.
31
+
30
32
## Supported document formats
31
33
32
34
The Azure Files indexer can extract text from the following document formats:
@@ -62,16 +64,16 @@ A data source definition can also include [soft deletion policies](search-howto-
62
64
63
65
Indexers can connect to a file share using the following connections.
|This connection string does not require an account key, but you must have previously configured a search service to [connect using a managed identity](search-howto-managed-identities-storage.md).|
| You can get the connection string from the Storage account page in Azure portal by selecting **Access keys** in the left navigation pane. Make sure to select a full connection string and not just a key. |
|This connection string does not require an account key, but you must have previously configured a search service to [connect using a managed identity](search-howto-managed-identities-data-sources.md).|
Copy file name to clipboardExpand all lines: articles/search/search-howto-index-azure-data-lake-storage.md
+10-10Lines changed: 10 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,7 @@ manager: nitinme
9
9
10
10
ms.service: cognitive-search
11
11
ms.topic: how-to
12
-
ms.date: 01/19/2022
12
+
ms.date: 02/11/2022
13
13
---
14
14
15
15
# Index data from Azure Data Lake Storage Gen2
@@ -28,9 +28,9 @@ For a code sample in C#, see [Index Data Lake Gen2 using Azure AD](https://githu
28
28
29
29
+[Access tiers](../storage/blobs/access-tiers-overview.md) for ADLS Gen2 include hot, cool, and archive. Only hot and cool can be accessed by search indexers.
30
30
31
-
+ Blobs containing text. If you have binary data, you can include [AI enrichment](cognitive-search-concept-intro.md) for image analysis.
31
+
+ Blobs containing text. If you have binary data, you can include [AI enrichment](cognitive-search-concept-intro.md) for image analysis. Note that blob content cannot exceed the [indexer limits](search-limits-quotas-capacity.md#indexer-limits) for your search service tier.
32
32
33
-
Note that blob content cannot exceed the [indexer limits](search-limits-quotas-capacity.md#indexer-limits) for your search service tier.
33
+
+ Read permissions on Azure Storage. A "full access" connection string includes a key that grants access to the content, but if you're using Azure roles instead, make sure the [search service managed identity](search-howto-managed-identities-data-sources.md) has **Storage Blob Data Reader** permissions.
34
34
35
35
## Access control
36
36
@@ -75,16 +75,16 @@ A data source definition can also include [soft deletion policies](search-howto-
75
75
76
76
Indexers can connect to a blob container using the following connections.
|This connection string does not require an account key, but you must have previously configured a search service to [connect using a managed identity](search-howto-managed-identities-storage.md).|
| You can get the connection string from the Storage account page in Azure portal by selecting **Access keys** in the left navigation pane. Make sure to select a full connection string and not just a key. |
|This connection string does not require an account key, but you must have previously configured a search service to [connect using a managed identity](search-howto-managed-identities-data-sources.md).|
A compound or embedded document (such as a ZIP archive, a Word document with embedded Outlook email containing attachments, or a .MSG file with attachments) is also indexed as a single document. For example, all images extracted from the attachments of an .MSG file will be returned in the normalized_images field. If you have images, consider adding [AI enrichment](cognitive-search-concept-intro.md) to get more search utility from that content.
193
+
A compound or embedded document (such as a ZIP archive, a Word document with embedded Outlook email containing attachments, or an .MSG file with attachments) is also indexed as a single document. For example, all images extracted from the attachments of an .MSG file will be returned in the normalized_images field. If you have images, consider adding [AI enrichment](cognitive-search-concept-intro.md) to get more search utility from that content.
194
194
195
195
Textual content of a document is extracted into a string field named "content".
196
196
@@ -256,7 +256,7 @@ Add the following metadata properties and values to blobs in Blob Storage. When
256
256
| Property name | Property value | Explanation |
257
257
| ------------- | -------------- | ----------- |
258
258
|`AzureSearch_Skip`|`"true"`|Instructs the blob indexer to completely skip the blob. Neither metadata nor content extraction is attempted. This is useful when a particular blob fails repeatedly and interrupts the indexing process. |
259
-
|`AzureSearch_SkipContent`|`"true"`|This is equivalent of`"dataToExtract" : "allMetadata"` setting described [above](#PartsOfBlobToIndex) scoped to a particular blob. |
259
+
|`AzureSearch_SkipContent`|`"true"`|This is equivalent to the`"dataToExtract" : "allMetadata"` setting described [above](#PartsOfBlobToIndex) scoped to a particular blob. |
Copy file name to clipboardExpand all lines: articles/search/search-howto-indexing-azure-blob-storage.md
+10-10Lines changed: 10 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,7 @@ manager: nitinme
9
9
10
10
ms.service: cognitive-search
11
11
ms.topic: how-to
12
-
ms.date: 01/19/2022
12
+
ms.date: 02/11/2022
13
13
---
14
14
15
15
# Index data from Azure Blob Storage
@@ -28,9 +28,9 @@ This article supplements [**Create an indexer**](search-howto-create-indexers.md
28
28
29
29
+[Access tiers](../storage/blobs/access-tiers-overview.md) for Blob storage include hot, cool, and archive. Only hot and cool can be accessed by search indexers.
30
30
31
-
+ Blobs containing text. If you have binary data, you can include [AI enrichment](cognitive-search-concept-intro.md) for image analysis.
31
+
+ Blobs containing text. If you have binary data, you can include [AI enrichment](cognitive-search-concept-intro.md) for image analysis. Note that blob content cannot exceed the [indexer limits](search-limits-quotas-capacity.md#indexer-limits) for your search service tier.
32
32
33
-
Note that blob content cannot exceed the [indexer limits](search-limits-quotas-capacity.md#indexer-limits) for your search service tier.
33
+
+ Read permissions on Azure Storage. A "full access" connection string includes a key that grants access to the content, but if you're using Azure roles instead, make sure the [search service managed identity](search-howto-managed-identities-data-sources.md) has **Storage Blob Data Reader** permissions.
34
34
35
35
<aname="SupportedFormats"></a>
36
36
@@ -69,16 +69,16 @@ A data source definition can also include [soft deletion policies](search-howto-
69
69
70
70
Indexers can connect to a blob container using the following connections.
|This connection string does not require an account key, but you must have previously configured a search service to [connect using a managed identity](search-howto-managed-identities-storage.md).|
| You can get the connection string from the Storage account page in Azure portal by selecting **Access keys** in the left navigation pane. Make sure to select a full connection string and not just a key. |
|This connection string does not require an account key, but you must have previously configured a search service to [connect using a managed identity](search-howto-managed-identities-data-sources.md).|
A compound or embedded document (such as a ZIP archive, a Word document with embedded Outlook email containing attachments, or a .MSG file with attachments) is also indexed as a single document. For example, all images extracted from the attachments of an .MSG file will be returned in the normalized_images field. If you have images, consider adding [AI enrichment](cognitive-search-concept-intro.md) to get more search utility from that content.
187
+
A compound or embedded document (such as a ZIP archive, a Word document with embedded Outlook email containing attachments, or an .MSG file with attachments) is also indexed as a single document. For example, all images extracted from the attachments of an .MSG file will be returned in the normalized_images field. If you have images, consider adding [AI enrichment](cognitive-search-concept-intro.md) to get more search utility from that content.
188
188
189
189
Textual content of a document is extracted into a string field named "content".
190
190
@@ -250,7 +250,7 @@ Add the following metadata properties and values to blobs in Blob Storage. When
250
250
| Property name | Property value | Explanation |
251
251
| ------------- | -------------- | ----------- |
252
252
| "AzureSearch_Skip" |`"true"`|Instructs the blob indexer to completely skip the blob. Neither metadata nor content extraction is attempted. This is useful when a particular blob fails repeatedly and interrupts the indexing process. |
253
-
| "AzureSearch_SkipContent" |`"true"`|This is equivalent of "dataToExtract" : "allMetadata" setting described [above](#PartsOfBlobToIndex) scoped to a particular blob. |
253
+
| "AzureSearch_SkipContent" |`"true"`|This is equivalent to the `"dataToExtract" : "allMetadata"` setting described [above](#PartsOfBlobToIndex) scoped to a particular blob. |
0 commit comments