Skip to content

Commit 294669c

Browse files
author
Mark Heffernan
committed
Small updates
1 parent 3aa0ac5 commit 294669c

File tree

1 file changed

+10
-10
lines changed

1 file changed

+10
-10
lines changed

articles/search/search-howto-indexing-azure-blob-storage.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -294,7 +294,7 @@ When you set up a blob indexer to run on a schedule, it reindexes only the chang
294294
> [!NOTE]
295295
> You don't have to specify a change detection policy – incremental indexing is enabled for you automatically.
296296
297-
To support deleting documents, use a soft delete approach. If you delete the blobs outright, corresponding documents will not be removed from the search index. Blobs must be in a soft delete state for Azure Cognitive Search to process them.
297+
To support deleting documents, use a "soft delete" approach. If you delete the blobs outright, corresponding documents will not be removed from the search index.
298298

299299
There are two ways to implement the soft delete approach. Both are described below.
300300

@@ -303,12 +303,12 @@ There are two ways to implement the soft delete approach. Both are described bel
303303
> [!IMPORTANT]
304304
> Support for native blob soft delete is in preview. Preview functionality is provided without a service level agreement, and is not recommended for production workloads. For more information, see [Supplemental Terms of Use for Microsoft Azure Previews](https://azure.microsoft.com/support/legal/preview-supplemental-terms/). The [REST API version 2019-05-06-Preview](https://docs.microsoft.com/azure/search/search-api-preview) provides this feature. There is currently no portal or .NET SDK support.
305305
306-
In this method you will use the native blob soft delete feature offered by Azure Blob storage. If the data source has a native soft delete policy set and the indexer finds a blob that has been soft deleted, the indexer will remove that document from the index.
306+
In this method you will use the [native blob soft delete](https://docs.microsoft.com/azure/storage/blobs/storage-blob-soft-delete) feature offered by Azure Blob storage. If the data source has a native soft delete policy set and the indexer finds a blob that has been transitioned to a soft deleted state, the indexer will remove that document from the index.
307307

308308
Use the following steps:
309-
1. Enable [native soft delete for Azure Blob storage](https://docs.microsoft.com/azure/storage/blobs/storage-blob-soft-delete). We recommend setting the retention policy to a value that's much higher than your indexer interval schedule. This way if there's an issue running the indexer or if you have a large number of documents to index, there's plenty of time for the indexer to eventually process the soft deleted documents. Azure Cognitive Search indexers will only delete a document from the index if it processes the blob while it's in a soft delete state.
310-
2. Configure a native blob soft deletion detection policy on the data source. An example is shown below. Since this feature is in preview, you must use the preview REST API.
311-
3. When the indexer processes the blob it will be removed from the index.
309+
1. Enable [native soft delete for Azure Blob storage](https://docs.microsoft.com/azure/storage/blobs/storage-blob-soft-delete). We recommend setting the retention policy to a value that's much higher than your indexer interval schedule. This way if there's an issue running the indexer or if you have a large number of documents to index, there's plenty of time for the indexer to eventually process the soft deleted blobs. Azure Cognitive Search indexers will only delete a document from the index if it processes the blob while it's in a soft deleted state.
310+
1. Configure a native blob soft deletion detection policy on the data source. An example is shown below. Since this feature is in preview, you must use the preview REST API.
311+
1. Run the indexer or set the indexer to run on a schedule. When the indexer runs and processes the blob the document will be removed from the index.
312312

313313
```
314314
PUT https://[service name].search.windows.net/datasources/blob-datasource?api-version=2019-05-06-Preview
@@ -318,7 +318,7 @@ Use the following steps:
318318
"name" : "blob-datasource",
319319
"type" : "azureblob",
320320
"credentials" : { "connectionString" : "<your storage connection string>" },
321-
"container" : { "name" : "my-container", "query" : "my-folder" },
321+
"container" : { "name" : "my-container", "query" : null },
322322
"dataDeletionDetectionPolicy" : {
323323
"@odata.type" :"#Microsoft.Azure.Search.NativeBlobSoftDeleteDeletionDetectionPolicy"
324324
}
@@ -327,9 +327,9 @@ Use the following steps:
327327
328328
#### Reindexing undeleted blobs
329329
330-
If you natively soft delete a blob from Azure Blob storage you have the option to undelete that blob within the retention period. When an Azure Cognitive Search data source has a native blob soft delete policy and the indexer processes a soft deleted blob it will remove that document from the index. If that blob is later undeleted the indexer will **not** always reindex that blob. This is because the indexer determines which blobs to index based on the blob's `LastModified` timestamp. When a soft deleted blob is undeleted its `LastModified` timestamp does not get updated so if the indexer has already processed blobs with `LastModified` timestamps more recent than the undeleted blob it won't reindex the undeleted blob. To make sure that an undeleted blob is reindexed, you should resave the metadata of that blob. This will update its `LastModified` timestamp so that the indexer knows that it needs to index this blob.
330+
If you delete a blob from Azure Blob storage with native soft delete enabled on your storage account the blob will transition to a soft deleted state giving you the option to undelete that blob within the retention period. When an Azure Cognitive Search data source has a native blob soft delete policy and the indexer processes a soft deleted blob it will remove that document from the index. If that blob is later undeleted the indexer will **not** always reindex that blob. This is because the indexer determines which blobs to index based on the blob's `LastModified` timestamp. When a soft deleted blob is undeleted its `LastModified` timestamp does not get updated, so if the indexer has already processed blobs with `LastModified` timestamps more recent than the undeleted blob it won't reindex the undeleted blob. To make sure that an undeleted blob is reindexed, you should resave the metadata of that blob. You don't need to change the metadata but resaving the metadata will update the blob's `LastModified` timestamp so that the indexer knows that it needs to reindex this blob.
331331
332-
### Custom soft delete
332+
### Soft delete using custom metadata
333333
334334
In this method you will use a custom metadata property to indicate when a document should be removed from the search index.
335335
@@ -349,7 +349,7 @@ For example, the following policy considers a blob to be deleted if it has a met
349349
"name" : "blob-datasource",
350350
"type" : "azureblob",
351351
"credentials" : { "connectionString" : "<your storage connection string>" },
352-
"container" : { "name" : "my-container", "query" : "my-folder" },
352+
"container" : { "name" : "my-container", "query" : null },
353353
"dataDeletionDetectionPolicy" : {
354354
"@odata.type" :"#Microsoft.Azure.Search.SoftDeleteColumnDeletionDetectionPolicy",
355355
"softDeleteColumnName" : "IsDeleted",
@@ -359,7 +359,7 @@ For example, the following policy considers a blob to be deleted if it has a met
359359
360360
#### Reindexing undeleted blobs
361361
362-
If you set a soft delete column detection policy on your data source, set the soft delete column name with the marker value, then ran the indexer the indexer will remove that document from the index. If you would like reindex that document, simply change the marker value for that blob and rerun the indexer.
362+
If you set a soft delete column detection policy on your data source, then add the custom metadata property to a blob with the marker value, then run the indexer, the indexer will remove that document from the index. If you would like reindex that document, simply change the soft delete metadata value for that blob and rerun the indexer.
363363
364364
## Indexing large datasets
365365

0 commit comments

Comments
 (0)