Skip to content

Commit d48aaee

Browse files
authored
Merge pull request #106818 from MarkHeff/nativesoftdelete-update
Native blob soft delete (preview)
2 parents cc2908a + 294669c commit d48aaee

File tree

1 file changed

+50
-6
lines changed

1 file changed

+50
-6
lines changed

articles/search/search-howto-indexing-azure-blob-storage.md

Lines changed: 50 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -288,16 +288,56 @@ You can also continue indexing if errors happen at any point of processing, eith
288288
}
289289

290290
## Incremental indexing and deletion detection
291+
291292
When you set up a blob indexer to run on a schedule, it reindexes only the changed blobs, as determined by the blob's `LastModified` timestamp.
292293

293294
> [!NOTE]
294295
> You don't have to specify a change detection policy – incremental indexing is enabled for you automatically.
295296
296-
To support deleting documents, use a "soft delete" approach. If you delete the blobs outright, corresponding documents will not be removed from the search index. Instead, use the following steps:
297+
To support deleting documents, use a "soft delete" approach. If you delete the blobs outright, corresponding documents will not be removed from the search index.
298+
299+
There are two ways to implement the soft delete approach. Both are described below.
300+
301+
### Native blob soft delete (preview)
302+
303+
> [!IMPORTANT]
304+
> Support for native blob soft delete is in preview. Preview functionality is provided without a service level agreement, and is not recommended for production workloads. For more information, see [Supplemental Terms of Use for Microsoft Azure Previews](https://azure.microsoft.com/support/legal/preview-supplemental-terms/). The [REST API version 2019-05-06-Preview](https://docs.microsoft.com/azure/search/search-api-preview) provides this feature. There is currently no portal or .NET SDK support.
305+
306+
In this method you will use the [native blob soft delete](https://docs.microsoft.com/azure/storage/blobs/storage-blob-soft-delete) feature offered by Azure Blob storage. If the data source has a native soft delete policy set and the indexer finds a blob that has been transitioned to a soft deleted state, the indexer will remove that document from the index.
307+
308+
Use the following steps:
309+
1. Enable [native soft delete for Azure Blob storage](https://docs.microsoft.com/azure/storage/blobs/storage-blob-soft-delete). We recommend setting the retention policy to a value that's much higher than your indexer interval schedule. This way if there's an issue running the indexer or if you have a large number of documents to index, there's plenty of time for the indexer to eventually process the soft deleted blobs. Azure Cognitive Search indexers will only delete a document from the index if it processes the blob while it's in a soft deleted state.
310+
1. Configure a native blob soft deletion detection policy on the data source. An example is shown below. Since this feature is in preview, you must use the preview REST API.
311+
1. Run the indexer or set the indexer to run on a schedule. When the indexer runs and processes the blob the document will be removed from the index.
312+
313+
```
314+
PUT https://[service name].search.windows.net/datasources/blob-datasource?api-version=2019-05-06-Preview
315+
Content-Type: application/json
316+
api-key: [admin key]
317+
{
318+
"name" : "blob-datasource",
319+
"type" : "azureblob",
320+
"credentials" : { "connectionString" : "<your storage connection string>" },
321+
"container" : { "name" : "my-container", "query" : null },
322+
"dataDeletionDetectionPolicy" : {
323+
"@odata.type" :"#Microsoft.Azure.Search.NativeBlobSoftDeleteDeletionDetectionPolicy"
324+
}
325+
}
326+
```
327+
328+
#### Reindexing undeleted blobs
329+
330+
If you delete a blob from Azure Blob storage with native soft delete enabled on your storage account the blob will transition to a soft deleted state giving you the option to undelete that blob within the retention period. When an Azure Cognitive Search data source has a native blob soft delete policy and the indexer processes a soft deleted blob it will remove that document from the index. If that blob is later undeleted the indexer will **not** always reindex that blob. This is because the indexer determines which blobs to index based on the blob's `LastModified` timestamp. When a soft deleted blob is undeleted its `LastModified` timestamp does not get updated, so if the indexer has already processed blobs with `LastModified` timestamps more recent than the undeleted blob it won't reindex the undeleted blob. To make sure that an undeleted blob is reindexed, you should resave the metadata of that blob. You don't need to change the metadata but resaving the metadata will update the blob's `LastModified` timestamp so that the indexer knows that it needs to reindex this blob.
331+
332+
### Soft delete using custom metadata
297333
298-
1. Add a custom metadata property to the blob to indicate to Azure Cognitive Search that it is logically deleted
299-
2. Configure a soft deletion detection policy on the data source
300-
3. Once the indexer has processed the blob (as shown by the indexer status API), you can physically delete the blob
334+
In this method you will use a custom metadata property to indicate when a document should be removed from the search index.
335+
336+
Use the following steps:
337+
338+
1. Add a custom metadata property to the blob to indicate to Azure Cognitive Search that it is logically deleted.
339+
1. Configure a soft deletion column detection policy on the data source. An example is shown below.
340+
1. Once the indexer has processed the blob and deleted the document from the index, you can delete the blob for Azure Blob storage.
301341
302342
For example, the following policy considers a blob to be deleted if it has a metadata property `IsDeleted` with the value `true`:
303343
@@ -309,13 +349,17 @@ For example, the following policy considers a blob to be deleted if it has a met
309349
"name" : "blob-datasource",
310350
"type" : "azureblob",
311351
"credentials" : { "connectionString" : "<your storage connection string>" },
312-
"container" : { "name" : "my-container", "query" : "my-folder" },
352+
"container" : { "name" : "my-container", "query" : null },
313353
"dataDeletionDetectionPolicy" : {
314354
"@odata.type" :"#Microsoft.Azure.Search.SoftDeleteColumnDeletionDetectionPolicy",
315355
"softDeleteColumnName" : "IsDeleted",
316356
"softDeleteMarkerValue" : "true"
317357
}
318-
}
358+
}
359+
360+
#### Reindexing undeleted blobs
361+
362+
If you set a soft delete column detection policy on your data source, then add the custom metadata property to a blob with the marker value, then run the indexer, the indexer will remove that document from the index. If you would like reindex that document, simply change the soft delete metadata value for that blob and rerun the indexer.
319363
320364
## Indexing large datasets
321365

0 commit comments

Comments
 (0)