You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/search/search-file-storage-integration.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -159,7 +159,7 @@ Indexer configuration specifies the inputs, parameters, and properties controlli
159
159
160
160
1. [Specify field mappings](search-indexer-field-mappings.md) if there are differences in field name or type, or if you need multiple versions of a source field in the search index.
161
161
162
-
In file indexing, you can often omit field mappings because the indexer has built-in support for mapping the "content" and metadata properties to to similarly named and typed fields in an index. For metadata properties, the indexer will automatically replace hyphens `-` with underscores in the search index.
162
+
In file indexing, you can often omit field mappings because the indexer has built-in support for mapping the "content" and metadata properties to similarly named and typed fields in an index. For metadata properties, the indexer will automatically replace hyphens `-` with underscores in the search index.
163
163
164
164
1. See [Create an indexer](search-howto-create-indexers.md) for more information about other properties.
Copy file name to clipboardExpand all lines: articles/search/search-howto-index-azure-data-lake-storage.md
+22-48Lines changed: 22 additions & 48 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -133,7 +133,7 @@ In a [search index](search-what-is-an-index.md), add fields to accept the conten
133
133
134
134
## Configure the ADLS Gen2 indexer
135
135
136
-
Indexer configuration specifies the inputs, parameters, and properties controlling run time behaviors. Under "configuration", you can specify which blobs are indexed by file type or by properties on the blob themselves.
136
+
Indexer configuration specifies the inputs, parameters, and properties controlling run time behaviors. The "configuration" section determines what content gets indexed.
137
137
138
138
1. [Create or update an indexer](/rest/api/searchservice/create-indexer) to use the predefined data source and search index.
139
139
@@ -150,68 +150,38 @@ Indexer configuration specifies the inputs, parameters, and properties controlli
150
150
"base64EncodeKeys": null,
151
151
"configuration:" {
152
152
"indexedFileNameExtensions" : ".pdf,.docx",
153
-
"excludedFileNameExtensions" : ".png,.jpeg"
153
+
"excludedFileNameExtensions" : ".png,.jpeg",
154
+
"dataToExtract": "contentAndMetadata",
155
+
"parsingMode": "default",
156
+
"imageAction": "none"
154
157
}
155
158
},
156
159
"schedule" : { },
157
160
"fieldMappings" : [ ]
158
161
}
159
162
```
160
163
161
-
1. In the optional "configuration" section, provide any inclusion or exclusion criteria. If left unspecified, all blobs in the container are retrieved.
164
+
1. Set "batchSize` if the default (10 documents) is either under utilizing or overwhelming available resources. Default batch sizes are data source specific. Blob indexing sets batch size at 10 documents in recognition of the larger average document size.
162
165
163
-
1. [Specify field mappings](search-indexer-field-mappings.md) if there are differences in field name or type, or if you need multiple versions of a source field in the search index.
164
-
165
-
In blob indexing, you can often omit field mappings because the indexer has built-in support for mapping the "content" and metadata properties to to similarly named and typed fields in an index. For metadata properties, the indexer will automatically replace hyphens `-` with underscores in the search index.
166
-
167
-
1. See [Create an indexer](search-howto-create-indexers.md) for more information about other properties.
168
-
169
-
For the full list of parameter descriptions, see [Blob configuration parameters](/rest/api/searchservice/create-indexer#blob-configuration-parameters) in the REST API.
170
-
171
-
### How to make an encoded field "searchable"
172
-
173
-
There are times when you need to use an encoded version of a field like `metadata_storage_path` as the key, but also need that field to be searchable (without encoding) in the search index. To support both use cases, you can map `metadata_storage_path` to two fields; one for the key (encoded), and a second for a path field that we can assume is attributed as "searchable" in the index schema. The example below shows two field mappings for `metadata_storage_path`.
174
-
175
-
```http
176
-
PUT https://[service name].search.windows.net/indexers/adlsgen2-indexer?api-version=2020-06-30
1. Under "configuration", provide any [inclusion or exclusion criteria](#PartsOfBlobToIndex) based on file type or leave unspecified to retrieve all blobs.
190
167
191
-
<aname="PartsOfBlobToIndex"></a>
168
+
1. Set "dataToExtract" to control which parts of the blobs are indexed:
192
169
193
-
## Index content and metadata
170
+
+ "contentAndMetadata" specifies that all metadata and textual content extracted from the blob are indexed. This is the default value.
194
171
195
-
Data Lake Storage Gen2 blobs contain content and metadata. You can control which parts of the blobs are indexed using the `dataToExtract` configuration parameter. It can take the following values:
172
+
+ "storageMetadata" specifies that only the [standard blob properties and user-specified metadata](../storage/blobs/storage-blob-container-properties-metadata.md) are indexed.
196
173
197
-
+`contentAndMetadata` - specifies that all metadata and textual content extracted from the blob are indexed. This is the default value.
174
+
+ "allMetadata" specifies that standard blob properties and any [metadata for found content types](search-blob-metadata-properties.md) are extracted from the blob content and indexed.
198
175
199
-
+`storageMetadata` - specifies that only the [standard blob properties and user-specified metadata](../storage/blobs/storage-blob-container-properties-metadata.md) are indexed.
176
+
1. Set "parsingMode" if blobs should be mapped to [multiple search documents](search-howto-index-one-to-many-blobs.md), or if they consist of [plain text](search-howto-index-plaintext-blobs.md), [JSON documents](search-howto-index-json-blobs.md), or [CSV files](search-howto-index-csv-blobs.md).
200
177
201
-
+`allMetadata` - specifies that standard blob properties and any [metadata for found content types](search-blob-metadata-properties.md) are extracted from the blob content and indexed.
178
+
1. [Specify field mappings](search-indexer-field-mappings.md) if there are differences in field name or type, or if you need multiple versions of a source field in the search index.
202
179
203
-
For example, to index only the storage metadata, use:
180
+
In blob indexing, you can often omit field mappings because the indexer has built-in support for mapping the "content" and metadata properties to similarly named and typed fields in an index. For metadata properties, the indexer will automatically replace hyphens `-` with underscores in the search index.
204
181
205
-
```http
206
-
PUT https://[service name].search.windows.net/indexers/[indexer name]?api-version=2020-06-30
207
-
Content-Type: application/json
208
-
api-key: [admin key]
182
+
1. See [Create an indexer](search-howto-create-indexers.md) for more information about other properties.
For the full list of parameter descriptions, see [Blob configuration parameters](/rest/api/searchservice/create-indexer#blob-configuration-parameters) in the REST API.
215
185
216
186
## How blobs are indexed
217
187
@@ -257,6 +227,8 @@ Lastly, any metadata properties specific to the document format of the blobs you
257
227
258
228
It's important to point out that you don't need to define fields for all of the above properties in your search index - just capture the properties you need for your application.
259
229
230
+
<a name="PartsOfBlobToIndex"></a>
231
+
260
232
## How to control which blobs are indexed
261
233
262
234
You can control which blobs are indexed, and which are skipped, by the blob's file type or by setting properties on the blob themselves, causing the indexer to skip over them.
@@ -277,7 +249,9 @@ PUT /indexers/[indexer name]?api-version=2020-06-30
277
249
278
250
### Add "skip" metadata the blob
279
251
280
-
The indexer configuration parameters apply to all blobs in the container or folder. Sometimes, you want to control how *individual blobs* are indexed. You can do this by adding the following metadata properties and values to blobs in Blob storage. When the indexer encounters this property, it will skip the blob or its content in the indexing run.
252
+
The indexer configuration parameters apply to all blobs in the container or folder. Sometimes, you want to control how *individual blobs* are indexed.
253
+
254
+
Add the following metadata properties and values to blobs in Blob Storage. When the indexer encounters this property, it will skip the blob or its content in the indexing run.
281
255
282
256
| Property name | Property value | Explanation |
283
257
| ------------- | -------------- | ----------- |
@@ -294,7 +268,7 @@ Indexing blobs can be a time-consuming process. In cases where you have millions
294
268
295
269
1. Create one indexer for each data source. Point them to the same target index.
296
270
297
-
Make sure you have sufficient capacity. One search unit in your service can run one indexer at any given time. Partitioning data and creating multiple indexers is only useful if they can run in parallel.
271
+
Make sure you have sufficient capacity. One search unit in your service can run one indexer at any given time. Creating multiple indexers is only useful if they can run in parallel.
Copy file name to clipboardExpand all lines: articles/search/search-howto-index-changed-deleted-blobs.md
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -75,6 +75,8 @@ If you restore a soft deleted blob in Blob storage, the indexer will not always
75
75
76
76
To make sure that an undeleted blob is reindexed, you will need to update the blob's `LastModified` timestamp. One way to do this is by resaving the metadata of that blob. You don't need to change the metadata, but resaving the metadata will update the blob's `LastModified` timestamp so that the indexer knows to pick it up.
77
77
78
+
<a name="soft-delete-using-custom-metadata"></a>
79
+
78
80
## Custom metadata: Soft delete strategy
79
81
80
82
This method uses custom metadata to indicate whether a search document should be removed from the index. It requires two separate actions: deleting the search document from the index, followed by file deletion in Azure Storage.
Copy file name to clipboardExpand all lines: articles/search/search-howto-indexing-azure-blob-storage.md
+8-8Lines changed: 8 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -125,8 +125,6 @@ In a [search index](search-what-is-an-index.md), add fields to accept the conten
125
125
126
126
1. Add more fields for any blob metadata that you want in the index. The indexer can read custom metadata properties, [standard metadata](#indexing-blob-metadata) properties, and [content-specific metadata](search-blob-metadata-properties.md) properties.
127
127
128
-
<a name="PartsOfBlobToIndex"></a>
129
-
130
128
## Configure the blob indexer
131
129
132
130
Indexer configuration specifies the inputs, parameters, and properties controlling run time behaviors. The "configuration" section determines what content gets indexed.
@@ -159,9 +157,7 @@ Indexer configuration specifies the inputs, parameters, and properties controlli
159
157
160
158
1. Set "batchSize` if the default (10 documents) is either under utilizing or overwhelming available resources. Default batch sizes are data source specific. Blob indexing sets batch size at 10 documents in recognition of the larger average document size.
161
159
162
-
1. Under "configuration", provide any inclusion or exclusion criteria. If left unspecified, all blobs in the container are retrieved.
163
-
164
-
If both `indexedFileNameExtensions` and `excludedFileNameExtensions` parameters are present, Azure Cognitive Search first looks at `indexedFileNameExtensions`, then at `excludedFileNameExtensions`. If the same file extension is present in both lists, it will be excluded from indexing.
160
+
1. Under "configuration", provide any [inclusion or exclusion criteria](#PartsOfBlobToIndex) based on file type or leave unspecified to retrieve all blobs.
165
161
166
162
1. Set "dataToExtract" to control which parts of the blobs are indexed:
167
163
@@ -175,7 +171,7 @@ Indexer configuration specifies the inputs, parameters, and properties controlli
175
171
176
172
1. [Specify field mappings](search-indexer-field-mappings.md) if there are differences in field name or type, or if you need multiple versions of a source field in the search index.
177
173
178
-
In blob indexing, you can often omit field mappings because the indexer has built-in support for mapping the "content" and metadata properties to to similarly named and typed fields in an index. For metadata properties, the indexer will automatically replace hyphens `-` with underscores in the search index.
174
+
In blob indexing, you can often omit field mappings because the indexer has built-in support for mapping the "content" and metadata properties to similarly named and typed fields in an index. For metadata properties, the indexer will automatically replace hyphens `-` with underscores in the search index.
179
175
180
176
1. See [Create an indexer](search-howto-create-indexers.md) for more information about other properties.
181
177
@@ -225,6 +221,8 @@ Lastly, any metadata properties specific to the document format of the blobs you
225
221
226
222
It's important to point out that you don't need to define fields for all of the above properties in your search index - just capture the properties you need for your application.
227
223
224
+
<a name="PartsOfBlobToIndex"></a>
225
+
228
226
## How to control which blobs are indexed
229
227
230
228
You can control which blobs are indexed, and which are skipped, by the blob's file type or by setting properties on the blob themselves, causing the indexer to skip over them.
@@ -245,7 +243,9 @@ PUT /indexers/[indexer name]?api-version=2020-06-30
245
243
246
244
### Add "skip" metadata the blob
247
245
248
-
The indexer configuration parameters apply to all blobs in the container or folder. Sometimes, you want to control how *individual blobs* are indexed. You can do this by adding the following metadata properties and values to blobs in Blob Storage. When the indexer encounters this property, it will skip the blob or its content in the indexing run.
246
+
The indexer configuration parameters apply to all blobs in the container or folder. Sometimes, you want to control how *individual blobs* are indexed.
247
+
248
+
Add the following metadata properties and values to blobs in Blob Storage. When the indexer encounters this property, it will skip the blob or its content in the indexing run.
249
249
250
250
| Property name | Property value | Explanation |
251
251
| ------------- | -------------- | ----------- |
@@ -262,7 +262,7 @@ Indexing blobs can be a time-consuming process. In cases where you have millions
262
262
263
263
1. Create one indexer for each data source. Point them to the same target index.
264
264
265
-
Make sure you have sufficient capacity. One search unit in your service can run one indexer at any given time. Partitioning data and creating multiple indexers is only useful if they can run in parallel.
265
+
Make sure you have sufficient capacity. One search unit in your service can run one indexer at any given time. Creating multiple indexers is only useful if they can run in parallel.
Copy file name to clipboardExpand all lines: articles/search/search-indexer-field-mappings.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -148,7 +148,7 @@ PUT /indexers?api-version=2020-06-30
148
148
149
149
A document key (both before and after conversion) can't be longer than 1,024 characters. When you retrieve the encoded key at search time, use the `base64Decode` function to get the original key value, and use that to retrieve the source document.
150
150
151
-
#### Example: Make an base-encoded field "searchable"
151
+
#### Example: Make a base-encoded field "searchable"
152
152
153
153
There are times when you need to use an encoded version of a field like "metadata_storage_path" as the key, but also need an un-encoded version for full text search. To support both scenarios, you can map "metadata_storage_path" to two fields: one for the key (encoded), and a second for a path field that we can assume is attributed as "searchable" in the index schema.
154
154
@@ -321,7 +321,7 @@ When you retrieve the encoded key at search time, you can then use the `urlDecod
321
321
322
322
### Example - decode blob metadata
323
323
324
-
Some Azure storage clients automatically url encode blob metadata if it contains non-ASCII characters. However, if you want to make such metadata searchable (as plain text), you can use the `urlDecode` function to turn the encoded data back into regular strings when populating your search index.
324
+
Some Azure storage clients automatically URL-encode blob metadata if it contains non-ASCII characters. However, if you want to make such metadata searchable (as plain text), you can use the `urlDecode` function to turn the encoded data back into regular strings when populating your search index.
325
325
326
326
```JSON
327
327
"fieldMappings" : [
@@ -338,7 +338,7 @@ When you retrieve the encoded key at search time, you can then use the `urlDecod
338
338
339
339
### fixedLengthEncode function
340
340
341
-
This function converts a string of any length to a fixedlength string.
341
+
This function converts a string of any length to a fixed-length string.
0 commit comments