You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/search/search-file-storage-integration.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,7 +7,7 @@ author: mattmsft
7
7
ms.author: magottei
8
8
ms.service: cognitive-search
9
9
ms.topic: how-to
10
-
ms.date: 01/17/2022
10
+
ms.date: 01/19/2022
11
11
---
12
12
13
13
# Index data from Azure Files
@@ -54,7 +54,7 @@ The data source definition specifies the data source type, content path, and how
54
54
55
55
1. Set "container" to the root file share, and use "query" to specify any subfolders.
56
56
57
-
A data source definition can also include additional properties for [soft deletion policies](#soft-delete-using-custom-metadata) and [field mappings](search-indexer-field-mappings.md) if field names and types are not the same.
57
+
A data source definition can also include [soft deletion policies](search-howto-index-changed-deleted-blobs.md), if you want the indexer to delete a search document when the source document is flagged for deletion.
Copy file name to clipboardExpand all lines: articles/search/search-howto-index-azure-data-lake-storage.md
+8-6Lines changed: 8 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,7 @@ manager: nitinme
9
9
10
10
ms.service: cognitive-search
11
11
ms.topic: how-to
12
-
ms.date: 01/17/2022
12
+
ms.date: 01/19/2022
13
13
---
14
14
15
15
# Index data from Azure Data Lake Storage Gen2
@@ -67,7 +67,7 @@ The data source definition specifies the data source type, content path, and how
67
67
68
68
1. Set `"container"` to the blob container, and use "query" to specify any subfolders.
69
69
70
-
A data source definition can also include properties for [soft deletion policies](search-howto-index-changed-deleted-blobs.md) and [field mappings](search-indexer-field-mappings.md) if field names and types are not the same or need to be forked.
70
+
A data source definition can also include [soft deletion policies](search-howto-index-changed-deleted-blobs.md), if you want the indexer to delete a search document when the source document is flagged for deletion.
71
71
72
72
<a name="Credentials"></a>
73
73
@@ -166,6 +166,8 @@ Indexer configuration specifies the inputs, parameters, and properties controlli
166
166
167
167
1. See [Create an indexer](search-howto-create-indexers.md) for more information about other properties.
168
168
169
+
For the full list of parameter descriptions, see [Blob configuration parameters](/rest/api/searchservice/create-indexer#blob-configuration-parameters) in the REST API.
170
+
169
171
### How to make an encoded field "searchable"
170
172
171
173
There are times when you need to use an encoded version of a field like `metadata_storage_path` as the key, but also need that field to be searchable (without encoding) in the search index. To support both use cases, you can map `metadata_storage_path` to two fields; one for the key (encoded), and a second for a path field that we can assume is attributed as "searchable" in the index schema. The example below shows two field mappings for `metadata_storage_path`.
@@ -235,7 +237,7 @@ User-specified metadata properties are extracted verbatim. To receive the values
235
237
236
238
Standard blob metadata properties can be extracted into similarly named and typed fields, as listed below. The blob indexer automatically creates internal field mappings for these blob metadata properties, converting the original hyphenated name ("metadata-storage-name") to an underscored equivalent name ("metadata_storage_name").
237
239
238
-
You still have to add the underscored fields to the index definition, but you can omit creating field mappings in the indexer because the indexer will recognize the counterpart automatically.
240
+
You still have to add the underscored fields to the index definition, but you can omit field mappings because the indexer will make the association automatically.
239
241
240
242
+**metadata_storage_name** (`Edm.String`) - the file name of the blob. For example, if you have a blob /my-container/my-folder/subfolder/resume.pdf, the value of this field is `resume.pdf`.
241
243
@@ -282,21 +284,21 @@ The indexer configuration parameters apply to all blobs in the container or fold
282
284
|`AzureSearch_Skip`|`"true"`|Instructs the blob indexer to completely skip the blob. Neither metadata nor content extraction is attempted. This is useful when a particular blob fails repeatedly and interrupts the indexing process. |
283
285
|`AzureSearch_SkipContent`|`"true"`|This is equivalent of `"dataToExtract" : "allMetadata"` setting described [above](#PartsOfBlobToIndex) scoped to a particular blob. |
284
286
285
-
## Index large datasets
287
+
## How to index large datasets
286
288
287
289
Indexing blobs can be a time-consuming process. In cases where you have millions of blobs to index, you can speed up indexing by partitioning your data and using multiple indexers to [process the data in parallel](search-howto-large-index.md#parallel-indexing).
288
290
289
291
1. Partition your data into multiple blob containers or virtual folders.
290
292
291
-
1. Set up several data sources, one per container or folder. Use the `query` parameter to specify the partition: `"container" : { "name" : "my-container", "query" : "my-folder" }`.
293
+
1. Set up several data sources, one per container or folder. Use the "query" parameter to specify the partition: `"container" : { "name" : "my-container", "query" : "my-folder" }`.
292
294
293
295
1. Create one indexer for each data source. Point them to the same target index.
294
296
295
297
Make sure you have sufficient capacity. One search unit in your service can run one indexer at any given time. Partitioning data and creating multiple indexers is only useful if they can run in parallel.
296
298
297
299
<aname="DealingWithErrors"></a>
298
300
299
-
## Configure the response to errors
301
+
## Handle errors
300
302
301
303
Errors that commonly occur during indexing include unsupported content types, missing content, or oversized blobs.
Copy file name to clipboardExpand all lines: articles/search/search-howto-index-changed-deleted-blobs.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,15 +1,15 @@
1
1
---
2
2
title: Changed and deleted blobs
3
3
titleSuffix: Azure Cognitive Search
4
-
description: Indexers that index from Azure Storage can pick up new and changed content automaticaly. To automate deletion detection, follow the strategies described in this article.
4
+
description: Indexers that index from Azure Storage can pick up new and changed content automatically. To automate deletion detection, follow the strategies described in this article.
5
5
6
6
author: gmndrg
7
7
ms.author: gimondra
8
8
manager: nitinme
9
9
10
10
ms.service: cognitive-search
11
11
ms.topic: how-to
12
-
ms.date: 01/18/2022
12
+
ms.date: 01/19/2022
13
13
---
14
14
15
15
# Change and delete detection using indexers for Azure Storage in Azure Cognitive Search
@@ -108,7 +108,7 @@ You can reverse a soft-delete if the original source file still physically exist
108
108
109
109
1. Change the `"softDeleteMarkerValue" : "false"` on the blob or file in Azure Storage.
110
110
111
-
1. Check the blob or file's `LastModified` timestamp to make it is newer than the last indexer run. You can force an update to the current date and time by resaving the existing metadata.
111
+
1. Check the blob or file's `LastModified` timestamp to make it is newer than the last indexer run. You can force an update to the current date and time by re-saving the existing metadata.
Copy file name to clipboardExpand all lines: articles/search/search-howto-indexing-azure-blob-storage.md
+7-7Lines changed: 7 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,7 @@ manager: nitinme
9
9
10
10
ms.service: cognitive-search
11
11
ms.topic: how-to
12
-
ms.date: 01/17/2022
12
+
ms.date: 01/19/2022
13
13
---
14
14
15
15
# Index data from Azure Blob Storage
@@ -61,7 +61,7 @@ The data source definition specifies the data source type, content path, and how
61
61
62
62
1. Set "container" to the blob container, and use "query" to specify any subfolders.
63
63
64
-
A data source definition can also include properties for [soft deletion policies](search-howto-index-changed-deleted-blobs.md) and [field mappings](search-indexer-field-mappings.md) if field names and types are not the same or need to be forked.
64
+
A data source definition can also include [soft deletion policies](search-howto-index-changed-deleted-blobs.md), if you want the indexer to delete a search document when the source document is flagged for deletion.
65
65
66
66
<a name="credentials"></a>
67
67
@@ -197,15 +197,15 @@ Textual content of a document is extracted into a string field named "content".
197
197
198
198
<a name="indexing-blob-metadata"></a>
199
199
200
-
## Indexing blob metadata
200
+
### Indexing blob metadata
201
201
202
202
Blob metadata can also be indexed, and that's helpful if you think any of the standard or custom metadata properties will be useful in filters and queries.
203
203
204
204
User-specified metadata properties are extracted verbatim. To receive the values, you must define field in the search index of type `Edm.String`, with same name as the metadata key of the blob. For example, if a blob has a metadata key of `Sensitivity` with value `High`, you should define a field named `Sensitivity` in your search index and it will be populated with the value `High`.
205
205
206
206
Standard blob metadata properties can be extracted into similarly named and typed fields, as listed below. The blob indexer automatically creates internal field mappings for these blob metadata properties, converting the original hyphenated name ("metadata-storage-name") to an underscored equivalent name ("metadata_storage_name").
207
207
208
-
You still have to add the underscored fields to the index definition, but you can omit creating field mappings in the indexer because the indexer will recognize the counterpart automatically.
208
+
You still have to add the underscored fields to the index definition, but you can omit field mappings because the indexer will make the association automatically.
209
209
210
210
+ **metadata_storage_name** (`Edm.String`) - the file name of the blob. For example, if you have a blob /my-container/my-folder/subfolder/resume.pdf, the value of this field is `resume.pdf`.
211
211
@@ -252,21 +252,21 @@ The indexer configuration parameters apply to all blobs in the container or fold
252
252
| "AzureSearch_Skip" |`"true"`|Instructs the blob indexer to completely skip the blob. Neither metadata nor content extraction is attempted. This is useful when a particular blob fails repeatedly and interrupts the indexing process. |
253
253
| "AzureSearch_SkipContent" |`"true"`|This is equivalent of "dataToExtract" : "allMetadata" setting described [above](#PartsOfBlobToIndex) scoped to a particular blob. |
254
254
255
-
## Index large datasets
255
+
## How to index large datasets
256
256
257
257
Indexing blobs can be a time-consuming process. In cases where you have millions of blobs to index, you can speed up indexing by partitioning your data and using multiple indexers to [process the data in parallel](search-howto-large-index.md#parallel-indexing).
258
258
259
259
1. Partition your data into multiple blob containers or virtual folders.
260
260
261
-
1. Set up several data sources, one per container or folder. Use the `query` parameter to specify the partition: `"container" : { "name" : "my-container", "query" : "my-folder" }`.
261
+
1. Set up several data sources, one per container or folder. Use the "query" parameter to specify the partition: `"container" : { "name" : "my-container", "query" : "my-folder" }`.
262
262
263
263
1. Create one indexer for each data source. Point them to the same target index.
264
264
265
265
Make sure you have sufficient capacity. One search unit in your service can run one indexer at any given time. Partitioning data and creating multiple indexers is only useful if they can run in parallel.
266
266
267
267
<aname="DealingWithErrors"></a>
268
268
269
-
## Configure the response to errors
269
+
## Handle errors
270
270
271
271
Errors that commonly occur during indexing include unsupported content types, missing content, or oversized blobs.
Copy file name to clipboardExpand all lines: articles/search/search-howto-indexing-azure-tables.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,7 @@ ms.author: magottei
9
9
10
10
ms.service: cognitive-search
11
11
ms.topic: how-to
12
-
ms.date: 01/17/2022
12
+
ms.date: 01/19/2022
13
13
---
14
14
15
15
# Index data from Azure Table Storage
@@ -47,7 +47,7 @@ The data source definition specifies the data source type, content path, and how
47
47
48
48
1. Optionally, set "query" to a filter on PartitionKey. This is a best practice that improves performance. If "query" is specified any other way, the indexer will execute a full table scan, resulting in poor performance if the tables are large.
49
49
50
-
A data source definition can also include additional properties for [soft deletion policies](#soft-delete-using-custom-metadata) and [field mappings](search-indexer-field-mappings.md) if field names and types are not the same.
50
+
A data source definition can also include [soft deletion policies](search-howto-index-changed-deleted-blobs.md), if you want the indexer to delete a search document when the source document is flagged for deletion.
When using Azure Cognitive Search indexers, the indexer will automatically map fields in a data source to fields in a target index, assuming field names and types are compatible. In some cases, input data doesn't quite match the schema of your target index. One solution is to use *field mappings* to specifically set the data path during the indexing process.
19
+
When using Azure Cognitive Search indexers, the indexer will automatically map fields in a data source to fields in a target index, assuming field names and types are compatible. When input data doesn't quite match the schema of your target index, you can define *field mappings* to specifically set the data path.
20
20
21
-
Field mappings can be used to address the following scenarios:
21
+
Field mappings address the following scenarios:
22
22
23
23
+ Mismatched field names. Suppose your data source has a field named `_id`. Given that Azure Cognitive Search doesn't allow field names that start with an underscore, a field mapping lets you effectively rename a field.
24
24
@@ -122,23 +122,48 @@ A field mapping function transforms the contents of a field before it's stored i
122
122
123
123
Performs *URL-safe* Base64 encoding of the input string. Assumes that the input is UTF-8 encoded.
124
124
125
-
#### Example - document key lookup
125
+
#### Example: Base-encoding a document key
126
126
127
-
Only URL-safe characters can appear in an Azure Cognitive Search document key (so that you can address the document using the [Lookup API](/rest/api/searchservice/lookup-document)). If the source field for your key contains URL-unsafe characters, you can use the `base64Encode` function to convert it at indexing time. However, a document key (both before and after conversion) can't be longer than 1,024 characters.
127
+
Only URL-safe characters can appear in an Azure Cognitive Search document key (so that you can address the document using the [Lookup API](/rest/api/searchservice/lookup-document)). If the source field for your key contains URL-unsafe characters, such as `-` and `\`, use the `base64Encode` function to convert it at indexing time.
128
128
129
-
When you retrieve the encoded key at search time, use the `base64Decode` function to get the original key value, and use that to retrieve the source document.
129
+
The following example specifies the base64Encode function on "metadata_storage_name" to handle unsupported characters.
A document key (both before and after conversion) can't be longer than 1,024 characters. When you retrieve the encoded key at search time, use the `base64Decode` function to get the original key value, and use that to retrieve the source document.
150
+
151
+
#### Example: Make an base-encoded field "searchable"
152
+
153
+
There are times when you need to use an encoded version of a field like "metadata_storage_path" as the key, but also need an un-encoded version for full text search. To support both scenarios, you can map "metadata_storage_path" to two fields: one for the key (encoded), and a second for a path field that we can assume is attributed as "searchable" in the index schema.
@@ -329,60 +354,4 @@ When facing errors complaining about document key being longer than 1024 charact
329
354
"name" : "fixedLengthEncode"
330
355
}
331
356
}]
332
-
```
333
-
334
-
<!--
335
-
336
-
### Example: Base-encoding metadata_storage_name
337
-
338
-
The following example demonstrates "metadata_storage_name" as the document key. Assume the index has a key field named "key" and another field named "fileSize" for storing the document size. [Field mappings](search-indexer-field-mappings.md) in the indexer definition establish field associations, and "metadata_storage_name" has the [base64Encode field mapping function](search-indexer-field-mappings.md#base64EncodeFunction) to handle unsupported characters.
339
-
340
-
```http
341
-
POST https://[service name].search.windows.net/indexers?api-version=2020-06-30
### Example: How to make an encoded field "searchable"
354
-
355
-
There are times when you need to use an encoded version of a field like "metadata_storage_path" as the key, but also need that field to be searchable (without encoding) in the search index. To support both use cases, you can map "metadata_storage_path" to two fields; one for the key (encoded), and a second for a path field that we can assume is attributed as "searchable" in the index schema. The example below shows two field mappings for "metadata_storage_path".
The following example demonstrates `metadata_storage_name` as the document key. Assume the index has a key field named `key` and another field named `fileSize` for storing the document size. [Field mappings](search-indexer-field-mappings.md) in the indexer definition establish field associations, and `metadata_storage_name` has the [`base64Encode` field mapping function](search-indexer-field-mappings.md#base64EncodeFunction) to handle unsupported characters.
373
-
374
-
```http
375
-
PUT https://[service name].search.windows.net/indexers/adlsgen2-indexer?api-version=2020-06-30
0 commit comments