You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/search/search-howto-index-changed-deleted-blobs.md
+18-15Lines changed: 18 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,7 +16,7 @@ ms.date: 09/09/2022
16
16
17
17
After an initial search index is created, you might want subsequent indexer jobs to only pick up new and changed documents. For indexed content that originates from Azure Storage, change detection occurs automatically because indexers keep track of the last update using the built-in timestamps on objects and files in Azure Storage.
18
18
19
-
Although change detection is a given, deletion detection is not. An indexer doesn't track object deletion in data sources. To avoid having orphan search documents, you can implement a "soft delete" strategy that results in deleting search documents first, with physical deletion in Azure Storage following as a second step.
19
+
Although change detection is a given, deletion detection isn't. An indexer doesn't track object deletion in data sources. To avoid having orphan search documents, you can implement a "soft delete" strategy that results in deleting search documents first, with physical deletion in Azure Storage following as a second step.
20
20
21
21
There are two ways to implement a soft delete strategy:
22
22
@@ -37,20 +37,23 @@ There are two ways to implement a soft delete strategy:
37
37
For this deletion detection approach, Cognitive Search depends on the [native blob soft delete](../storage/blobs/soft-delete-blob-overview.md) feature in Azure Blob Storage to determine whether blobs have transitioned to a soft deleted state. When blobs are detected in this state, a search indexer uses this information to remove the corresponding document from the index.
38
38
39
39
> [!IMPORTANT]
40
-
> Support for native blob soft delete is in preview under [Supplemental Terms of Use](https://azure.microsoft.com/support/legal/preview-supplemental-terms/). The [REST API version 2020-06-30-Preview](./search-api-preview.md) provides this feature. There is currently no .NET SDK support.
40
+
> Support for native blob soft delete is in preview under [Supplemental Terms of Use](https://azure.microsoft.com/support/legal/preview-supplemental-terms/). The [REST API version 2020-06-30-Preview](./search-api-preview.md) provides this feature. There's currently no .NET SDK support.
41
41
42
42
### Requirements for native soft delete
43
43
44
+
+ Blobs must be in an Azure Blob Storage container. The Cognitive Search native blob soft delete policy isn't supported for blobs in ADLS Gen2.
45
+
44
46
+[Enable soft delete for blobs](../storage/blobs/soft-delete-blob-enable.md).
45
-
+ Blobs must be in an Azure Blob Storage container. The Cognitive Search native blob soft delete policy is not supported for blobs in ADLS Gen2.
46
-
+ Document keys for the documents in your index must be mapped to either be a blob property or blob metadata.
47
-
+ You must use the preview REST API (`api-version=2020-06-30-Preview`) or the indexer Data Source configuration in your Cognitive Search Service from the Azure portal, to configure support for soft delete.
48
47
49
-
### How to configure deletion detection using native soft delete
48
+
+ Document keys for the documents in your index must be mapped to either be a blob property or blob metadata, such as "metadata_storage_path".
49
+
50
+
+ You must use the preview REST API (`api-version=2020-06-30-Preview`), or the indexer Data Source configuration in the Azure portal, to configure support for soft delete.
51
+
52
+
### Configure native soft delete
50
53
51
-
1.In Blob storage, when enabling soft delete, set the retention policy to a value that's much higher than your indexer interval schedule. This way if there's an issue running the indexer or if you have a large number of documents to index, there's plenty of time for the indexer to eventually process the soft deleted blobs. Azure Cognitive Search indexers will only delete a document from the index if it processes the blob while it's in a soft deleted state.
54
+
In Blob storage, when enabling soft delete per the requirements, set the retention policy to a value that's much higher than your indexer interval schedule. If there's an issue running the indexer, or if you have a large number of documents to index, there's plenty of time for the indexer to eventually process the soft deleted blobs. Azure Cognitive Search indexers will only delete a document from the index if it processes the blob while it's in a soft deleted state.
52
55
53
-
1.In Cognitive Search, set a native blob soft deletion detection policy on the data source. You can do this either from the Azure portal or by using preview REST API (`api-version=2020-06-30-Preview`).
56
+
In Cognitive Search, set a native blob soft deletion detection policy on the data source. You can do this either from the Azure portal or by using preview REST API (`api-version=2020-06-30-Preview`). The following instructions explain how to set the delete detection policy in Azure portal or through REST APIs.
54
57
55
58
### [**Azure portal**](#tab/portal)
56
59
@@ -68,7 +71,7 @@ For this deletion detection approach, Cognitive Search depends on the [native bl
68
71
69
72
### [**REST**](#tab/rest-api)
70
73
71
-
An example of using REST API to set soft deletion detection policy on the data source is shown below.
74
+
Set the soft deletion detection policy in the data source definition. Specify the preview API version when creating or updating the data source.
72
75
73
76
```http
74
77
PUT https://[service name].search.windows.net/datasources/blob-datasource?api-version=2020-06-30-Preview
@@ -85,15 +88,15 @@ api-key: [admin key]
85
88
}
86
89
```
87
90
88
-
1.[Run the indexer](/rest/api/searchservice/run-indexer) or set the indexer to run [on a schedule](search-howto-schedule-indexers.md). When the indexer runs and processes a blob having a soft delete state, the corresponding search document will be removed from the index.
91
+
[Run the indexer](/rest/api/searchservice/run-indexer) or set the indexer to run [on a schedule](search-howto-schedule-indexers.md). When the indexer runs and processes a blob having a soft delete state, the corresponding search document will be removed from the index.
89
92
90
93
---
91
94
92
-
### Re-index un-deleted blobs using native soft delete policies
95
+
### Reindex undeleted blobs using native soft delete policies
93
96
94
-
If you restore a soft deleted blob in Blob storage, the indexer will not always re-index it. This is because the indexer uses the blob's `LastModified` timestamp to determine whether indexing is needed. When a soft deleted blob is undeleted, its `LastModified` timestamp does not get updated, so if the indexer has already processed blobs with more recent `LastModified` timestamps, it won't re-index the undeleted blob.
97
+
If you restore a soft deleted blob in Blob storage, the indexer won't always reindex it. This is because the indexer uses the blob's `LastModified` timestamp to determine whether indexing is needed. When a soft deleted blob is undeleted, its `LastModified` timestamp doesn't get updated, so if the indexer has already processed blobs with more recent `LastModified` timestamps, it won't reindex the undeleted blob.
95
98
96
-
To make sure that an undeleted blob is reindexed, you will need to update the blob's `LastModified` timestamp. One way to do this is by resaving the metadata of that blob. You don't need to change the metadata, but resaving the metadata will update the blob's `LastModified` timestamp so that the indexer knows to pick it up.
99
+
To make sure that an undeleted blob is reindexed, you'll need to update the blob's `LastModified` timestamp. One way to do this is by resaving the metadata of that blob. You don't need to change the metadata, but resaving the metadata will update the blob's `LastModified` timestamp so that the indexer knows to pick it up.
97
100
98
101
<aname="soft-delete-using-custom-metadata"></a>
99
102
@@ -124,13 +127,13 @@ There are steps to follow in both Azure Storage and Cognitive Search, but there
124
127
125
128
1. Run the indexer. Once the indexer has processed the file and deleted the document from the search index, you can then delete the physical file in Azure Storage.
126
129
127
-
## Re-index un-deleted blobs and files
130
+
## Reindex undeleted blobs and files
128
131
129
132
You can reverse a soft-delete if the original source file still physically exists in Azure Storage.
130
133
131
134
1. Change the `"softDeleteMarkerValue" : "false"` on the blob or file in Azure Storage.
132
135
133
-
1. Check the blob or file's `LastModified` timestamp to make it is newer than the last indexer run. You can force an update to the current date and time by re-saving the existing metadata.
136
+
1. Check the blob or file's `LastModified` timestamp to make it's newer than the last indexer run. You can force an update to the current date and time by resaving the existing metadata.
0 commit comments