Skip to content

Commit 9da48d8

Browse files
committed
checkpoint
1 parent bed7c5f commit 9da48d8

6 files changed

+64
-93
lines changed

articles/search/search-file-storage-integration.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ author: mattmsft
77
ms.author: magottei
88
ms.service: cognitive-search
99
ms.topic: how-to
10-
ms.date: 01/17/2022
10+
ms.date: 01/19/2022
1111
---
1212

1313
# Index data from Azure Files
@@ -54,7 +54,7 @@ The data source definition specifies the data source type, content path, and how
5454

5555
1. Set "container" to the root file share, and use "query" to specify any subfolders.
5656

57-
A data source definition can also include additional properties for [soft deletion policies](#soft-delete-using-custom-metadata) and [field mappings](search-indexer-field-mappings.md) if field names and types are not the same.
57+
A data source definition can also include [soft deletion policies](search-howto-index-changed-deleted-blobs.md), if you want the indexer to delete a search document when the source document is flagged for deletion.
5858

5959
<a name="Credentials"></a>
6060

articles/search/search-howto-index-azure-data-lake-storage.md

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ manager: nitinme
99

1010
ms.service: cognitive-search
1111
ms.topic: how-to
12-
ms.date: 01/17/2022
12+
ms.date: 01/19/2022
1313
---
1414

1515
# Index data from Azure Data Lake Storage Gen2
@@ -67,7 +67,7 @@ The data source definition specifies the data source type, content path, and how
6767

6868
1. Set `"container"` to the blob container, and use "query" to specify any subfolders.
6969

70-
A data source definition can also include properties for [soft deletion policies](search-howto-index-changed-deleted-blobs.md) and [field mappings](search-indexer-field-mappings.md) if field names and types are not the same or need to be forked.
70+
A data source definition can also include [soft deletion policies](search-howto-index-changed-deleted-blobs.md), if you want the indexer to delete a search document when the source document is flagged for deletion.
7171

7272
<a name="Credentials"></a>
7373

@@ -166,6 +166,8 @@ Indexer configuration specifies the inputs, parameters, and properties controlli
166166

167167
1. See [Create an indexer](search-howto-create-indexers.md) for more information about other properties.
168168

169+
For the full list of parameter descriptions, see [Blob configuration parameters](/rest/api/searchservice/create-indexer#blob-configuration-parameters) in the REST API.
170+
169171
### How to make an encoded field "searchable"
170172

171173
There are times when you need to use an encoded version of a field like `metadata_storage_path` as the key, but also need that field to be searchable (without encoding) in the search index. To support both use cases, you can map `metadata_storage_path` to two fields; one for the key (encoded), and a second for a path field that we can assume is attributed as "searchable" in the index schema. The example below shows two field mappings for `metadata_storage_path`.
@@ -235,7 +237,7 @@ User-specified metadata properties are extracted verbatim. To receive the values
235237

236238
Standard blob metadata properties can be extracted into similarly named and typed fields, as listed below. The blob indexer automatically creates internal field mappings for these blob metadata properties, converting the original hyphenated name ("metadata-storage-name") to an underscored equivalent name ("metadata_storage_name").
237239

238-
You still have to add the underscored fields to the index definition, but you can omit creating field mappings in the indexer because the indexer will recognize the counterpart automatically.
240+
You still have to add the underscored fields to the index definition, but you can omit field mappings because the indexer will make the association automatically.
239241

240242
+ **metadata_storage_name** (`Edm.String`) - the file name of the blob. For example, if you have a blob /my-container/my-folder/subfolder/resume.pdf, the value of this field is `resume.pdf`.
241243

@@ -282,21 +284,21 @@ The indexer configuration parameters apply to all blobs in the container or fold
282284
| `AzureSearch_Skip` |`"true"` |Instructs the blob indexer to completely skip the blob. Neither metadata nor content extraction is attempted. This is useful when a particular blob fails repeatedly and interrupts the indexing process. |
283285
| `AzureSearch_SkipContent` |`"true"` |This is equivalent of `"dataToExtract" : "allMetadata"` setting described [above](#PartsOfBlobToIndex) scoped to a particular blob. |
284286

285-
## Index large datasets
287+
## How to index large datasets
286288

287289
Indexing blobs can be a time-consuming process. In cases where you have millions of blobs to index, you can speed up indexing by partitioning your data and using multiple indexers to [process the data in parallel](search-howto-large-index.md#parallel-indexing).
288290

289291
1. Partition your data into multiple blob containers or virtual folders.
290292

291-
1. Set up several data sources, one per container or folder. Use the `query` parameter to specify the partition: `"container" : { "name" : "my-container", "query" : "my-folder" }`.
293+
1. Set up several data sources, one per container or folder. Use the "query" parameter to specify the partition: `"container" : { "name" : "my-container", "query" : "my-folder" }`.
292294

293295
1. Create one indexer for each data source. Point them to the same target index.
294296

295297
Make sure you have sufficient capacity. One search unit in your service can run one indexer at any given time. Partitioning data and creating multiple indexers is only useful if they can run in parallel.
296298

297299
<a name="DealingWithErrors"></a>
298300

299-
## Configure the response to errors
301+
## Handle errors
300302

301303
Errors that commonly occur during indexing include unsupported content types, missing content, or oversized blobs.
302304

articles/search/search-howto-index-changed-deleted-blobs.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,15 @@
11
---
22
title: Changed and deleted blobs
33
titleSuffix: Azure Cognitive Search
4-
description: Indexers that index from Azure Storage can pick up new and changed content automaticaly. To automate deletion detection, follow the strategies described in this article.
4+
description: Indexers that index from Azure Storage can pick up new and changed content automatically. To automate deletion detection, follow the strategies described in this article.
55

66
author: gmndrg
77
ms.author: gimondra
88
manager: nitinme
99

1010
ms.service: cognitive-search
1111
ms.topic: how-to
12-
ms.date: 01/18/2022
12+
ms.date: 01/19/2022
1313
---
1414

1515
# Change and delete detection using indexers for Azure Storage in Azure Cognitive Search
@@ -108,7 +108,7 @@ You can reverse a soft-delete if the original source file still physically exist
108108
109109
1. Change the `"softDeleteMarkerValue" : "false"` on the blob or file in Azure Storage.
110110
111-
1. Check the blob or file's `LastModified` timestamp to make it is newer than the last indexer run. You can force an update to the current date and time by resaving the existing metadata.
111+
1. Check the blob or file's `LastModified` timestamp to make it is newer than the last indexer run. You can force an update to the current date and time by re-saving the existing metadata.
112112
113113
1. Run the indexer.
114114

articles/search/search-howto-indexing-azure-blob-storage.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ manager: nitinme
99

1010
ms.service: cognitive-search
1111
ms.topic: how-to
12-
ms.date: 01/17/2022
12+
ms.date: 01/19/2022
1313
---
1414

1515
# Index data from Azure Blob Storage
@@ -61,7 +61,7 @@ The data source definition specifies the data source type, content path, and how
6161

6262
1. Set "container" to the blob container, and use "query" to specify any subfolders.
6363

64-
A data source definition can also include properties for [soft deletion policies](search-howto-index-changed-deleted-blobs.md) and [field mappings](search-indexer-field-mappings.md) if field names and types are not the same or need to be forked.
64+
A data source definition can also include [soft deletion policies](search-howto-index-changed-deleted-blobs.md), if you want the indexer to delete a search document when the source document is flagged for deletion.
6565

6666
<a name="credentials"></a>
6767

@@ -197,15 +197,15 @@ Textual content of a document is extracted into a string field named "content".
197197

198198
<a name="indexing-blob-metadata"></a>
199199

200-
## Indexing blob metadata
200+
### Indexing blob metadata
201201

202202
Blob metadata can also be indexed, and that's helpful if you think any of the standard or custom metadata properties will be useful in filters and queries.
203203

204204
User-specified metadata properties are extracted verbatim. To receive the values, you must define field in the search index of type `Edm.String`, with same name as the metadata key of the blob. For example, if a blob has a metadata key of `Sensitivity` with value `High`, you should define a field named `Sensitivity` in your search index and it will be populated with the value `High`.
205205

206206
Standard blob metadata properties can be extracted into similarly named and typed fields, as listed below. The blob indexer automatically creates internal field mappings for these blob metadata properties, converting the original hyphenated name ("metadata-storage-name") to an underscored equivalent name ("metadata_storage_name").
207207

208-
You still have to add the underscored fields to the index definition, but you can omit creating field mappings in the indexer because the indexer will recognize the counterpart automatically.
208+
You still have to add the underscored fields to the index definition, but you can omit field mappings because the indexer will make the association automatically.
209209

210210
+ **metadata_storage_name** (`Edm.String`) - the file name of the blob. For example, if you have a blob /my-container/my-folder/subfolder/resume.pdf, the value of this field is `resume.pdf`.
211211

@@ -252,21 +252,21 @@ The indexer configuration parameters apply to all blobs in the container or fold
252252
| "AzureSearch_Skip" |`"true"` |Instructs the blob indexer to completely skip the blob. Neither metadata nor content extraction is attempted. This is useful when a particular blob fails repeatedly and interrupts the indexing process. |
253253
| "AzureSearch_SkipContent" |`"true"` |This is equivalent of "dataToExtract" : "allMetadata" setting described [above](#PartsOfBlobToIndex) scoped to a particular blob. |
254254

255-
## Index large datasets
255+
## How to index large datasets
256256

257257
Indexing blobs can be a time-consuming process. In cases where you have millions of blobs to index, you can speed up indexing by partitioning your data and using multiple indexers to [process the data in parallel](search-howto-large-index.md#parallel-indexing).
258258

259259
1. Partition your data into multiple blob containers or virtual folders.
260260

261-
1. Set up several data sources, one per container or folder. Use the `query` parameter to specify the partition: `"container" : { "name" : "my-container", "query" : "my-folder" }`.
261+
1. Set up several data sources, one per container or folder. Use the "query" parameter to specify the partition: `"container" : { "name" : "my-container", "query" : "my-folder" }`.
262262

263263
1. Create one indexer for each data source. Point them to the same target index.
264264

265265
Make sure you have sufficient capacity. One search unit in your service can run one indexer at any given time. Partitioning data and creating multiple indexers is only useful if they can run in parallel.
266266

267267
<a name="DealingWithErrors"></a>
268268

269-
## Configure the response to errors
269+
## Handle errors
270270

271271
Errors that commonly occur during indexing include unsupported content types, missing content, or oversized blobs.
272272

articles/search/search-howto-indexing-azure-tables.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ ms.author: magottei
99

1010
ms.service: cognitive-search
1111
ms.topic: how-to
12-
ms.date: 01/17/2022
12+
ms.date: 01/19/2022
1313
---
1414

1515
# Index data from Azure Table Storage
@@ -47,7 +47,7 @@ The data source definition specifies the data source type, content path, and how
4747

4848
1. Optionally, set "query" to a filter on PartitionKey. This is a best practice that improves performance. If "query" is specified any other way, the indexer will execute a full table scan, resulting in poor performance if the tables are large.
4949

50-
A data source definition can also include additional properties for [soft deletion policies](#soft-delete-using-custom-metadata) and [field mappings](search-indexer-field-mappings.md) if field names and types are not the same.
50+
A data source definition can also include [soft deletion policies](search-howto-index-changed-deleted-blobs.md), if you want the indexer to delete a search document when the source document is flagged for deletion.
5151

5252
<a name="Credentials"></a>
5353

articles/search/search-indexer-field-mappings.md

Lines changed: 42 additions & 73 deletions
Original file line numberDiff line numberDiff line change
@@ -9,16 +9,16 @@ ms.author: heidist
99

1010
ms.service: cognitive-search
1111
ms.topic: conceptual
12-
ms.date: 10/19/2021
12+
ms.date: 01/19/2022
1313
---
1414

1515
# Field mappings and transformations using Azure Cognitive Search indexers
1616

1717
![Indexer Stages](./media/search-indexer-field-mappings/indexer-stages-field-mappings.png "indexer stages")
1818

19-
When using Azure Cognitive Search indexers, the indexer will automatically map fields in a data source to fields in a target index, assuming field names and types are compatible. In some cases, input data doesn't quite match the schema of your target index. One solution is to use *field mappings* to specifically set the data path during the indexing process.
19+
When using Azure Cognitive Search indexers, the indexer will automatically map fields in a data source to fields in a target index, assuming field names and types are compatible. When input data doesn't quite match the schema of your target index, you can define *field mappings* to specifically set the data path.
2020

21-
Field mappings can be used to address the following scenarios:
21+
Field mappings address the following scenarios:
2222

2323
+ Mismatched field names. Suppose your data source has a field named `_id`. Given that Azure Cognitive Search doesn't allow field names that start with an underscore, a field mapping lets you effectively rename a field.
2424

@@ -122,23 +122,48 @@ A field mapping function transforms the contents of a field before it's stored i
122122

123123
Performs *URL-safe* Base64 encoding of the input string. Assumes that the input is UTF-8 encoded.
124124

125-
#### Example - document key lookup
125+
#### Example: Base-encoding a document key
126126

127-
Only URL-safe characters can appear in an Azure Cognitive Search document key (so that you can address the document using the [Lookup API](/rest/api/searchservice/lookup-document)). If the source field for your key contains URL-unsafe characters, you can use the `base64Encode` function to convert it at indexing time. However, a document key (both before and after conversion) can't be longer than 1,024 characters.
127+
Only URL-safe characters can appear in an Azure Cognitive Search document key (so that you can address the document using the [Lookup API](/rest/api/searchservice/lookup-document)). If the source field for your key contains URL-unsafe characters, such as `-` and `\`, use the `base64Encode` function to convert it at indexing time.
128128

129-
When you retrieve the encoded key at search time, use the `base64Decode` function to get the original key value, and use that to retrieve the source document.
129+
The following example specifies the base64Encode function on "metadata_storage_name" to handle unsupported characters.
130130

131-
```JSON
132-
"fieldMappings" : [
133-
{
134-
"sourceFieldName" : "SourceKey",
135-
"targetFieldName" : "IndexKey",
136-
"mappingFunction" : {
137-
"name" : "base64Encode",
138-
"parameters" : { "useHttpServerUtilityUrlTokenEncode" : false }
131+
```http
132+
PUT /indexers?api-version=2020-06-30
133+
{
134+
"dataSourceName" : "my-blob-datasource ",
135+
"targetIndexName" : "my-search-index",
136+
"fieldMappings" : [
137+
{
138+
"sourceFieldName" : "metadata_storage_name",
139+
"targetFieldName" : "key",
140+
"mappingFunction" : {
141+
"name" : "base64Encode",
142+
"parameters" : { "useHttpServerUtilityUrlTokenEncode" : false }
143+
}
139144
}
140-
}]
141-
```
145+
]
146+
}
147+
```
148+
149+
A document key (both before and after conversion) can't be longer than 1,024 characters. When you retrieve the encoded key at search time, use the `base64Decode` function to get the original key value, and use that to retrieve the source document.
150+
151+
#### Example: Make an base-encoded field "searchable"
152+
153+
There are times when you need to use an encoded version of a field like "metadata_storage_path" as the key, but also need an un-encoded version for full text search. To support both scenarios, you can map "metadata_storage_path" to two fields: one for the key (encoded), and a second for a path field that we can assume is attributed as "searchable" in the index schema.
154+
155+
```http
156+
PUT /indexers/blob-indexer?api-version=2020-06-30
157+
{
158+
"dataSourceName" : " blob-datasource ",
159+
"targetIndexName" : "my-target-index",
160+
"schedule" : { "interval" : "PT2H" },
161+
"fieldMappings" : [
162+
{ "sourceFieldName" : "metadata_storage_path", "targetFieldName" : "key", "mappingFunction" : { "name" : "base64Encode" } },
163+
{ "sourceFieldName" : "metadata_storage_path", "targetFieldName" : "path" }
164+
]
165+
}
166+
```
142167

143168
#### Example - preserve original values
144169

@@ -329,60 +354,4 @@ When facing errors complaining about document key being longer than 1024 charact
329354
"name" : "fixedLengthEncode"
330355
}
331356
}]
332-
```
333-
334-
<!--
335-
336-
### Example: Base-encoding metadata_storage_name
337-
338-
The following example demonstrates "metadata_storage_name" as the document key. Assume the index has a key field named "key" and another field named "fileSize" for storing the document size. [Field mappings](search-indexer-field-mappings.md) in the indexer definition establish field associations, and "metadata_storage_name" has the [base64Encode field mapping function](search-indexer-field-mappings.md#base64EncodeFunction) to handle unsupported characters.
339-
340-
```http
341-
POST https://[service name].search.windows.net/indexers?api-version=2020-06-30
342-
{
343-
"name" : "my-blob-indexer",
344-
"dataSourceName" : "my-blob-datasource ",
345-
"targetIndexName" : "my-search-index",
346-
"fieldMappings" : [
347-
{ "sourceFieldName" : "metadata_storage_name", "targetFieldName" : "key", "mappingFunction" : { "name" : "base64Encode" } },
348-
{ "sourceFieldName" : "metadata_storage_size", "targetFieldName" : "fileSize" }
349-
]
350-
}
351-
```
352-
353-
### Example: How to make an encoded field "searchable"
354-
355-
There are times when you need to use an encoded version of a field like "metadata_storage_path" as the key, but also need that field to be searchable (without encoding) in the search index. To support both use cases, you can map "metadata_storage_path" to two fields; one for the key (encoded), and a second for a path field that we can assume is attributed as "searchable" in the index schema. The example below shows two field mappings for "metadata_storage_path".
356-
357-
```http
358-
PUT /indexers/blob-indexer?api-version=2020-06-30
359-
{
360-
"dataSourceName" : " blob-datasource ",
361-
"targetIndexName" : "my-target-index",
362-
"schedule" : { "interval" : "PT2H" },
363-
"fieldMappings" : [
364-
{ "sourceFieldName" : "metadata_storage_path", "targetFieldName" : "key", "mappingFunction" : { "name" : "base64Encode" } },
365-
{ "sourceFieldName" : "metadata_storage_path", "targetFieldName" : "path" }
366-
]
367-
}
368-
``` -->
369-
370-
<!-- ### Example
371-
372-
The following example demonstrates `metadata_storage_name` as the document key. Assume the index has a key field named `key` and another field named `fileSize` for storing the document size. [Field mappings](search-indexer-field-mappings.md) in the indexer definition establish field associations, and `metadata_storage_name` has the [`base64Encode` field mapping function](search-indexer-field-mappings.md#base64EncodeFunction) to handle unsupported characters.
373-
374-
```http
375-
PUT https://[service name].search.windows.net/indexers/adlsgen2-indexer?api-version=2020-06-30
376-
Content-Type: application/json
377-
api-key: [admin key]
378-
379-
{
380-
"dataSourceName" : "adlsgen2-datasource",
381-
"targetIndexName" : "my-target-index",
382-
"schedule" : { "interval" : "PT2H" },
383-
"fieldMappings" : [
384-
{ "sourceFieldName" : "metadata_storage_name", "targetFieldName" : "key", "mappingFunction" : { "name" : "base64Encode" } },
385-
{ "sourceFieldName" : "metadata_storage_size", "targetFieldName" : "fileSize" }
386-
]
387-
}
388-
``` -->
357+
```

0 commit comments

Comments
 (0)