Skip to content

Commit 06b0c51

Browse files
committed
prepped for signoff
1 parent 9da48d8 commit 06b0c51

5 files changed

+36
-60
lines changed

articles/search/search-file-storage-integration.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -159,7 +159,7 @@ Indexer configuration specifies the inputs, parameters, and properties controlli
159159

160160
1. [Specify field mappings](search-indexer-field-mappings.md) if there are differences in field name or type, or if you need multiple versions of a source field in the search index.
161161

162-
In file indexing, you can often omit field mappings because the indexer has built-in support for mapping the "content" and metadata properties to to similarly named and typed fields in an index. For metadata properties, the indexer will automatically replace hyphens `-` with underscores in the search index.
162+
In file indexing, you can often omit field mappings because the indexer has built-in support for mapping the "content" and metadata properties to similarly named and typed fields in an index. For metadata properties, the indexer will automatically replace hyphens `-` with underscores in the search index.
163163

164164
1. See [Create an indexer](search-howto-create-indexers.md) for more information about other properties.
165165

articles/search/search-howto-index-azure-data-lake-storage.md

Lines changed: 22 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -133,7 +133,7 @@ In a [search index](search-what-is-an-index.md), add fields to accept the conten
133133

134134
## Configure the ADLS Gen2 indexer
135135

136-
Indexer configuration specifies the inputs, parameters, and properties controlling run time behaviors. Under "configuration", you can specify which blobs are indexed by file type or by properties on the blob themselves.
136+
Indexer configuration specifies the inputs, parameters, and properties controlling run time behaviors. The "configuration" section determines what content gets indexed.
137137

138138
1. [Create or update an indexer](/rest/api/searchservice/create-indexer) to use the predefined data source and search index.
139139

@@ -150,68 +150,38 @@ Indexer configuration specifies the inputs, parameters, and properties controlli
150150
"base64EncodeKeys": null,
151151
"configuration:" {
152152
"indexedFileNameExtensions" : ".pdf,.docx",
153-
"excludedFileNameExtensions" : ".png,.jpeg"
153+
"excludedFileNameExtensions" : ".png,.jpeg",
154+
"dataToExtract": "contentAndMetadata",
155+
"parsingMode": "default",
156+
"imageAction": "none"
154157
}
155158
},
156159
"schedule" : { },
157160
"fieldMappings" : [ ]
158161
}
159162
```
160163

161-
1. In the optional "configuration" section, provide any inclusion or exclusion criteria. If left unspecified, all blobs in the container are retrieved.
164+
1. Set "batchSize` if the default (10 documents) is either under utilizing or overwhelming available resources. Default batch sizes are data source specific. Blob indexing sets batch size at 10 documents in recognition of the larger average document size.
162165

163-
1. [Specify field mappings](search-indexer-field-mappings.md) if there are differences in field name or type, or if you need multiple versions of a source field in the search index.
164-
165-
In blob indexing, you can often omit field mappings because the indexer has built-in support for mapping the "content" and metadata properties to to similarly named and typed fields in an index. For metadata properties, the indexer will automatically replace hyphens `-` with underscores in the search index.
166-
167-
1. See [Create an indexer](search-howto-create-indexers.md) for more information about other properties.
168-
169-
For the full list of parameter descriptions, see [Blob configuration parameters](/rest/api/searchservice/create-indexer#blob-configuration-parameters) in the REST API.
170-
171-
### How to make an encoded field "searchable"
172-
173-
There are times when you need to use an encoded version of a field like `metadata_storage_path` as the key, but also need that field to be searchable (without encoding) in the search index. To support both use cases, you can map `metadata_storage_path` to two fields; one for the key (encoded), and a second for a path field that we can assume is attributed as "searchable" in the index schema. The example below shows two field mappings for `metadata_storage_path`.
174-
175-
```http
176-
PUT https://[service name].search.windows.net/indexers/adlsgen2-indexer?api-version=2020-06-30
177-
Content-Type: application/json
178-
api-key: [admin key]
179-
180-
{
181-
"dataSourceName" : " adlsgen2-datasource",
182-
"targetIndexName" : "my-target-index",
183-
"schedule" : { "interval" : "PT2H" },
184-
"fieldMappings" : [
185-
{ "sourceFieldName" : "metadata_storage_path", "targetFieldName" : "key", "mappingFunction" : { "name" : "base64Encode" } },
186-
{ "sourceFieldName" : "metadata_storage_path", "targetFieldName" : "path" }
187-
]
188-
}
189-
```
166+
1. Under "configuration", provide any [inclusion or exclusion criteria](#PartsOfBlobToIndex) based on file type or leave unspecified to retrieve all blobs.
190167

191-
<a name="PartsOfBlobToIndex"></a>
168+
1. Set "dataToExtract" to control which parts of the blobs are indexed:
192169

193-
## Index content and metadata
170+
+ "contentAndMetadata" specifies that all metadata and textual content extracted from the blob are indexed. This is the default value.
194171

195-
Data Lake Storage Gen2 blobs contain content and metadata. You can control which parts of the blobs are indexed using the `dataToExtract` configuration parameter. It can take the following values:
172+
+ "storageMetadata" specifies that only the [standard blob properties and user-specified metadata](../storage/blobs/storage-blob-container-properties-metadata.md) are indexed.
196173

197-
+ `contentAndMetadata` - specifies that all metadata and textual content extracted from the blob are indexed. This is the default value.
174+
+ "allMetadata" specifies that standard blob properties and any [metadata for found content types](search-blob-metadata-properties.md) are extracted from the blob content and indexed.
198175

199-
+ `storageMetadata` - specifies that only the [standard blob properties and user-specified metadata](../storage/blobs/storage-blob-container-properties-metadata.md) are indexed.
176+
1. Set "parsingMode" if blobs should be mapped to [multiple search documents](search-howto-index-one-to-many-blobs.md), or if they consist of [plain text](search-howto-index-plaintext-blobs.md), [JSON documents](search-howto-index-json-blobs.md), or [CSV files](search-howto-index-csv-blobs.md).
200177

201-
+ `allMetadata` - specifies that standard blob properties and any [metadata for found content types](search-blob-metadata-properties.md) are extracted from the blob content and indexed.
178+
1. [Specify field mappings](search-indexer-field-mappings.md) if there are differences in field name or type, or if you need multiple versions of a source field in the search index.
202179

203-
For example, to index only the storage metadata, use:
180+
In blob indexing, you can often omit field mappings because the indexer has built-in support for mapping the "content" and metadata properties to similarly named and typed fields in an index. For metadata properties, the indexer will automatically replace hyphens `-` with underscores in the search index.
204181

205-
```http
206-
PUT https://[service name].search.windows.net/indexers/[indexer name]?api-version=2020-06-30
207-
Content-Type: application/json
208-
api-key: [admin key]
182+
1. See [Create an indexer](search-howto-create-indexers.md) for more information about other properties.
209183

210-
{
211-
... other parts of indexer definition
212-
"parameters" : { "configuration" : { "dataToExtract" : "storageMetadata" } }
213-
}
214-
```
184+
For the full list of parameter descriptions, see [Blob configuration parameters](/rest/api/searchservice/create-indexer#blob-configuration-parameters) in the REST API.
215185

216186
## How blobs are indexed
217187

@@ -257,6 +227,8 @@ Lastly, any metadata properties specific to the document format of the blobs you
257227

258228
It's important to point out that you don't need to define fields for all of the above properties in your search index - just capture the properties you need for your application.
259229

230+
<a name="PartsOfBlobToIndex"></a>
231+
260232
## How to control which blobs are indexed
261233

262234
You can control which blobs are indexed, and which are skipped, by the blob's file type or by setting properties on the blob themselves, causing the indexer to skip over them.
@@ -277,7 +249,9 @@ PUT /indexers/[indexer name]?api-version=2020-06-30
277249

278250
### Add "skip" metadata the blob
279251

280-
The indexer configuration parameters apply to all blobs in the container or folder. Sometimes, you want to control how *individual blobs* are indexed. You can do this by adding the following metadata properties and values to blobs in Blob storage. When the indexer encounters this property, it will skip the blob or its content in the indexing run.
252+
The indexer configuration parameters apply to all blobs in the container or folder. Sometimes, you want to control how *individual blobs* are indexed.
253+
254+
Add the following metadata properties and values to blobs in Blob Storage. When the indexer encounters this property, it will skip the blob or its content in the indexing run.
281255

282256
| Property name | Property value | Explanation |
283257
| ------------- | -------------- | ----------- |
@@ -294,7 +268,7 @@ Indexing blobs can be a time-consuming process. In cases where you have millions
294268

295269
1. Create one indexer for each data source. Point them to the same target index.
296270

297-
Make sure you have sufficient capacity. One search unit in your service can run one indexer at any given time. Partitioning data and creating multiple indexers is only useful if they can run in parallel.
271+
Make sure you have sufficient capacity. One search unit in your service can run one indexer at any given time. Creating multiple indexers is only useful if they can run in parallel.
298272

299273
<a name="DealingWithErrors"></a>
300274

articles/search/search-howto-index-changed-deleted-blobs.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,8 @@ If you restore a soft deleted blob in Blob storage, the indexer will not always
7575
7676
To make sure that an undeleted blob is reindexed, you will need to update the blob's `LastModified` timestamp. One way to do this is by resaving the metadata of that blob. You don't need to change the metadata, but resaving the metadata will update the blob's `LastModified` timestamp so that the indexer knows to pick it up.
7777
78+
<a name="soft-delete-using-custom-metadata"></a>
79+
7880
## Custom metadata: Soft delete strategy
7981
8082
This method uses custom metadata to indicate whether a search document should be removed from the index. It requires two separate actions: deleting the search document from the index, followed by file deletion in Azure Storage.

articles/search/search-howto-indexing-azure-blob-storage.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -125,8 +125,6 @@ In a [search index](search-what-is-an-index.md), add fields to accept the conten
125125

126126
1. Add more fields for any blob metadata that you want in the index. The indexer can read custom metadata properties, [standard metadata](#indexing-blob-metadata) properties, and [content-specific metadata](search-blob-metadata-properties.md) properties.
127127

128-
<a name="PartsOfBlobToIndex"></a>
129-
130128
## Configure the blob indexer
131129

132130
Indexer configuration specifies the inputs, parameters, and properties controlling run time behaviors. The "configuration" section determines what content gets indexed.
@@ -159,9 +157,7 @@ Indexer configuration specifies the inputs, parameters, and properties controlli
159157

160158
1. Set "batchSize` if the default (10 documents) is either under utilizing or overwhelming available resources. Default batch sizes are data source specific. Blob indexing sets batch size at 10 documents in recognition of the larger average document size.
161159

162-
1. Under "configuration", provide any inclusion or exclusion criteria. If left unspecified, all blobs in the container are retrieved.
163-
164-
If both `indexedFileNameExtensions` and `excludedFileNameExtensions` parameters are present, Azure Cognitive Search first looks at `indexedFileNameExtensions`, then at `excludedFileNameExtensions`. If the same file extension is present in both lists, it will be excluded from indexing.
160+
1. Under "configuration", provide any [inclusion or exclusion criteria](#PartsOfBlobToIndex) based on file type or leave unspecified to retrieve all blobs.
165161

166162
1. Set "dataToExtract" to control which parts of the blobs are indexed:
167163

@@ -175,7 +171,7 @@ Indexer configuration specifies the inputs, parameters, and properties controlli
175171

176172
1. [Specify field mappings](search-indexer-field-mappings.md) if there are differences in field name or type, or if you need multiple versions of a source field in the search index.
177173

178-
In blob indexing, you can often omit field mappings because the indexer has built-in support for mapping the "content" and metadata properties to to similarly named and typed fields in an index. For metadata properties, the indexer will automatically replace hyphens `-` with underscores in the search index.
174+
In blob indexing, you can often omit field mappings because the indexer has built-in support for mapping the "content" and metadata properties to similarly named and typed fields in an index. For metadata properties, the indexer will automatically replace hyphens `-` with underscores in the search index.
179175

180176
1. See [Create an indexer](search-howto-create-indexers.md) for more information about other properties.
181177

@@ -225,6 +221,8 @@ Lastly, any metadata properties specific to the document format of the blobs you
225221

226222
It's important to point out that you don't need to define fields for all of the above properties in your search index - just capture the properties you need for your application.
227223

224+
<a name="PartsOfBlobToIndex"></a>
225+
228226
## How to control which blobs are indexed
229227

230228
You can control which blobs are indexed, and which are skipped, by the blob's file type or by setting properties on the blob themselves, causing the indexer to skip over them.
@@ -245,7 +243,9 @@ PUT /indexers/[indexer name]?api-version=2020-06-30
245243

246244
### Add "skip" metadata the blob
247245

248-
The indexer configuration parameters apply to all blobs in the container or folder. Sometimes, you want to control how *individual blobs* are indexed. You can do this by adding the following metadata properties and values to blobs in Blob Storage. When the indexer encounters this property, it will skip the blob or its content in the indexing run.
246+
The indexer configuration parameters apply to all blobs in the container or folder. Sometimes, you want to control how *individual blobs* are indexed.
247+
248+
Add the following metadata properties and values to blobs in Blob Storage. When the indexer encounters this property, it will skip the blob or its content in the indexing run.
249249

250250
| Property name | Property value | Explanation |
251251
| ------------- | -------------- | ----------- |
@@ -262,7 +262,7 @@ Indexing blobs can be a time-consuming process. In cases where you have millions
262262

263263
1. Create one indexer for each data source. Point them to the same target index.
264264

265-
Make sure you have sufficient capacity. One search unit in your service can run one indexer at any given time. Partitioning data and creating multiple indexers is only useful if they can run in parallel.
265+
Make sure you have sufficient capacity. One search unit in your service can run one indexer at any given time. Creating multiple indexers is only useful if they can run in parallel.
266266

267267
<a name="DealingWithErrors"></a>
268268

articles/search/search-indexer-field-mappings.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -148,7 +148,7 @@ PUT /indexers?api-version=2020-06-30
148148

149149
A document key (both before and after conversion) can't be longer than 1,024 characters. When you retrieve the encoded key at search time, use the `base64Decode` function to get the original key value, and use that to retrieve the source document.
150150

151-
#### Example: Make an base-encoded field "searchable"
151+
#### Example: Make a base-encoded field "searchable"
152152

153153
There are times when you need to use an encoded version of a field like "metadata_storage_path" as the key, but also need an un-encoded version for full text search. To support both scenarios, you can map "metadata_storage_path" to two fields: one for the key (encoded), and a second for a path field that we can assume is attributed as "searchable" in the index schema.
154154

@@ -321,7 +321,7 @@ When you retrieve the encoded key at search time, you can then use the `urlDecod
321321

322322
### Example - decode blob metadata
323323

324-
Some Azure storage clients automatically url encode blob metadata if it contains non-ASCII characters. However, if you want to make such metadata searchable (as plain text), you can use the `urlDecode` function to turn the encoded data back into regular strings when populating your search index.
324+
Some Azure storage clients automatically URL-encode blob metadata if it contains non-ASCII characters. However, if you want to make such metadata searchable (as plain text), you can use the `urlDecode` function to turn the encoded data back into regular strings when populating your search index.
325325

326326
```JSON
327327
"fieldMappings" : [
@@ -338,7 +338,7 @@ When you retrieve the encoded key at search time, you can then use the `urlDecod
338338

339339
### fixedLengthEncode function
340340

341-
This function converts a string of any length to a fixed length string.
341+
This function converts a string of any length to a fixed-length string.
342342

343343
### Example - map document keys that are too long
344344

0 commit comments

Comments
 (0)