Skip to content

Commit 2edfef8

Browse files
committed
Porting changes to simplify another PR
1 parent 26203fb commit 2edfef8

File tree

4 files changed

+16
-13
lines changed

4 files changed

+16
-13
lines changed

articles/search/search-file-storage-integration.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ ms.date: 01/19/2022
1717
1818
Configure a [search indexer](search-indexer-overview.md) to extract content from Azure File Storage and make it searchable in Azure Cognitive Search.
1919

20-
This article supplements [**Create an indexer**](search-howto-create-indexers.md) with information specific to indexing files in Azure Storage.
20+
This article supplements [**Create an indexer**](search-howto-create-indexers.md) with information specific to indexing files in Azure Storage. It uses the REST APIs to demonstrate a three-part workflow common to all indexers: create a data source, create an index, create an indexer. Data extraction occurs when you submit the Create Indexer request.
2121

2222
## Prerequisites
2323

@@ -37,7 +37,7 @@ The Azure Files indexer can extract text from the following document formats:
3737

3838
## Define the data source
3939

40-
The data source definition specifies the data source type, content path, and how to connect.
40+
The data source definition specifies the data to index, credentials, and policies for identifying changes in the data. A data source is defined as an independent resource so that it can be used by multiple indexers.
4141

4242
1. [Create or update a data source](/rest/api/searchservice/preview-api/create-or-update-data-source) to set its definition, using a preview API version 2020-06-30-Preview or 2021-04-30-Preview for "type": `"azurefile"`.
4343

articles/search/search-howto-index-azure-data-lake-storage.md

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ Configure a [search indexer](search-indexer-overview.md) to extract content and
1818

1919
ADLS Gen2 is available through Azure Storage. When setting up a storage account, you have the option of enabling [hierarchical namespace](../storage/blobs/data-lake-storage-namespace.md), organizing files into a hierarchy of directories and nested subdirectories. By enabling a hierarchical namespace, you enable ADLS Gen2.
2020

21-
This article supplements [**Create an indexer**](search-howto-create-indexers.md) with information specific to indexing from ADLS Gen2.
21+
This article supplements [**Create an indexer**](search-howto-create-indexers.md) with information specific to indexing from ADLS Gen2. It uses the REST APIs to demonstrate a three-part workflow common to all indexers: create a data source, create an index, create an indexer. Data extraction occurs when you submit the Create Indexer request.
2222

2323
For a code sample in C#, see [Index Data Lake Gen2 using Azure AD](https://github.com/Azure-Samples/azure-search-dotnet-samples/blob/master/data-lake-gen2-acl-indexing/README.md) on GitHub.
2424

@@ -28,7 +28,7 @@ For a code sample in C#, see [Index Data Lake Gen2 using Azure AD](https://githu
2828

2929
+ [Access tiers](../storage/blobs/access-tiers-overview.md) for ADLS Gen2 include hot, cool, and archive. Only hot and cool can be accessed by search indexers.
3030

31-
+ Blobs containing text. If you have binary data, you can include [AI enrichment](cognitive-search-concept-intro.md) for image analysis. Note that blob content cannot exceed the [indexer limits](search-limits-quotas-capacity.md#indexer-limits) for your search service tier.
31+
+ Blobs containing text. If you have binary data, you can include [AI enrichment](cognitive-search-concept-intro.md) for image analysis. Blob content cannot exceed the [indexer limits](search-limits-quotas-capacity.md#indexer-limits) for your search service tier.
3232

3333
+ Read permissions on Azure Storage. A "full access" connection string includes a key that grants access to the content, but if you're using Azure roles instead, make sure the [search service managed identity](search-howto-managed-identities-data-sources.md) has **Storage Blob Data Reader** permissions.
3434

@@ -48,7 +48,7 @@ The ADLS Gen2 indexer can extract text from the following document formats:
4848

4949
## Define the data source
5050

51-
The data source definition specifies the data source type, content path, and how to connect.
51+
The data source definition specifies the data to index, credentials, and policies for identifying changes in the data. A data source is defined as an independent resource so that it can be used by multiple indexers.
5252

5353
1. [Create or update a data source](/rest/api/searchservice/create-data-source) to set its definition:
5454

@@ -83,7 +83,7 @@ Indexers can connect to a blob container using the following connections.
8383
| Managed identity connection string |
8484
|------------------------------------|
8585
|`{ "connectionString" : "ResourceId=/subscriptions/<your subscription ID>/resourceGroups/<your resource group name>/providers/Microsoft.Storage/storageAccounts/<your storage account name>/;" }`|
86-
|This connection string does not require an account key, but you must have previously configured a search service to [connect using a managed identity](search-howto-managed-identities-data-sources.md).|
86+
|This connection string doesn't require an account key, but you must have previously configured a search service to [connect using a managed identity](search-howto-managed-identities-data-sources.md).|
8787

8888
| Storage account shared access signature** (SAS) connection string |
8989
|-------------------------------------------------------------------|
@@ -119,14 +119,16 @@ In a [search index](search-what-is-an-index.md), add fields to accept the conten
119119
}
120120
```
121121

122-
1. Create a document key field ("key": true). For blob content, the best candidates are metadata properties. Metadata properties often include characters, such as `/` and `-`, that are invalid for document keys. Because the indexer has a "base64EncodeKeys" property (true by default), it automatically encodes the metadata property, with no configuration or field mapping required.
122+
1. Create a document key field ("key": true). For blob content, the best candidates are metadata properties.
123123

124-
+ **`metadata_storage_path`** (default) full path to the object or file
124+
+ **`metadata_storage_path`** (default) full path to the object or file. The key field ("ID" in this example) will be populated with values from metadata_storage_path because it's the default.
125125

126-
+ **`metadata_storage_name`** usable only if names are unique
126+
+ **`metadata_storage_name`**, usable only if names are unique. If you want this field as the key, move `"key": true` to this field definition.
127127

128128
+ A custom metadata property that you add to blobs. This option requires that your blob upload process adds that metadata property to all blobs. Since the key is a required property, any blobs that are missing a value will fail to be indexed. If you use a custom metadata property as a key, avoid making changes to that property. Indexers will add duplicate documents for the same blob if the key property changes.
129129

130+
Metadata properties often include characters, such as `/` and `-`, that are invalid for document keys. Because the indexer has a "base64EncodeKeys" property (true by default), it automatically encodes the metadata property, with no configuration or field mapping required.
131+
130132
1. Add a "content" field to store extracted text from each file through the blob's "content" property. You aren't required to use this name, but doing so lets you take advantage of implicit field mappings.
131133

132134
1. Add fields for standard metadata properties. The indexer can read custom metadata properties, [standard metadata](#indexing-blob-metadata) properties, and [content-specific metadata](search-blob-metadata-properties.md) properties.
@@ -161,7 +163,7 @@ Indexer configuration specifies the inputs, parameters, and properties controlli
161163
}
162164
```
163165

164-
1. Set "batchSize` if the default (10 documents) is either under utilizing or overwhelming available resources. Default batch sizes are data source specific. Blob indexing sets batch size at 10 documents in recognition of the larger average document size.
166+
1. Set "batchSize" if the default (10 documents) is either under utilizing or overwhelming available resources. Default batch sizes are data source specific. Blob indexing sets batch size at 10 documents in recognition of the larger average document size.
165167

166168
1. Under "configuration", provide any [inclusion or exclusion criteria](#PartsOfBlobToIndex) based on file type or leave unspecified to retrieve all blobs.
167169

articles/search/search-howto-indexing-azure-tables.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ ms.date: 02/11/2022
1616

1717
Configure a [search indexer](search-indexer-overview.md) to extract content from Azure Table Storage and make it searchable in Azure Cognitive Search.
1818

19-
This article supplements [**Create an indexer**](search-howto-create-indexers.md) with information specific to indexing from Azure Table Storage.
19+
This article supplements [**Create an indexer**](search-howto-create-indexers.md) with information specific to indexing from Azure Table Storage. It uses the REST APIs to demonstrate a three-part workflow common to all indexers: create a data source, create an index, create an indexer. Data extraction occurs when you submit the Create Indexer request.
2020

2121
## Prerequisites
2222

@@ -28,7 +28,7 @@ This article supplements [**Create an indexer**](search-howto-create-indexers.md
2828

2929
## Define the data source
3030

31-
The data source definition specifies the data source type, content path, and how to connect.
31+
The data source definition specifies the data to index, credentials, and policies for identifying changes in the data. A data source is defined as an independent resource so that it can be used by multiple indexers.
3232

3333
1. [Create or update a data source](/rest/api/searchservice/create-data-source) to set its definition:
3434

articles/search/search-indexer-securing-resources.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,8 @@ Customers can secure these resources via several network isolation mechanisms of
4242
| --- | --- | ---- |
4343
| Azure Storage (blobs, tables, ADLS Gen 2) | Supported only if the storage account and search service are in different regions | Supported |
4444
| Azure Cosmos DB - SQL API | Supported | Supported |
45-
| Azure Cosmos DB - MongoDB and Gremlin API | Supported | Unsupported |
45+
| Azure Cosmos DB - MongoDB API | Supported | Unsupported |
46+
| Azure Cosmos DB - Gremlin API | Supported | Unsupported |
4647
| Azure SQL Database | Supported | Supported |
4748
| SQL Server on Azure virtual machines | Supported | N/A |
4849
| SQL Managed Instance | Supported | N/A |

0 commit comments

Comments
 (0)